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ABSTRACT 

Eigenvalues  and  eigenfunctions  of  linear  operators  are  important  to  many  areas  of  ap¬ 
plied  mathematics.  The  ability  to  approximate  these  quantities  numerically  is  becoming 
increasingly  important  in  a  wide  variety  of  applications.  This  increasing  demand  has  fu¬ 
eled  interest  in  the  development  of  new  methods  and  software  for  the  numerical  solution 
of  large-scale  algebraic  eigenvalue  problems.  In  turn,  the  existence  of  these  new  methods 
and  software,  along  with  the  dramatically  increased  computational  capabilities  now  avail¬ 
able,  has  enabled  the  solution  of  problems  that  would  not  even  have  been  posed  five  or  ten 
years  ago.  Until  very  recently,  software  for  large-scale  nonsymmetric  problems  was  virtually 
non-existent.  Fortunately,  the  situation  is  improving  rapidly. 

The  purpose  of  this  article  is  to  provide  an  overview  of  the  numerical  solution  of  large- 
scale  algebraic  eigenvalue  problems.  The  focus  will  be  on  a  class  of  methods  called  Krylov 
subspace  projection  methods.  The  well-known  Lanczos  method  is  the  premier  member  of 
this  class.  The  Arnold!  method  generalizes  the  Lanczos  method  to  the  nonsymmetric  case. 
A  recently  developed  variant  of  the  Arnoldi/Lanczos  scheme  called  the  Implicitly  Restarted 
Arnoldi  Method  is  presented  here  in  some  depth.  This  method  is  highlighted  because  of  its 
suitability  as  a  basis  for  software  development. 


JThis  work  was  supported  in  part  by  NAS1-19480  while  the  author  was  in  residence  at  the  Institute  for 
Computer  Applications  in  Science  and  Engineering  (ICASE),  NASA  Langley  Research  Center,  Hampton, 
VA  23681-0001. 


1.  Introduction 

Discussion  begins  with  a  brief  synopsis  of  the  theory  and  the  basic  iterations  suitable 
for  large-scale  problems  to  motivate  the  introduction  of  Krylov  subspaces.  Then  the  Lanc- 
zos/Arnoldi  factorization  is  introduced,  along  with  a  discussion  of  its  important  approx¬ 
imation  properties.  Spectral  transformations  are  presented  as  a  means  to  improve  these 
approximation  properties  and  to  enhance  convergence  of  the  basic  methods.  Restarting  is 
introduced  as  a  way  to  overcome  intractable  storage  and  computational  requirements  in  the 
original  Arnoldi  method.  Implicit  restarting  is  a  new  sophisticated  variant  of  restarting. 
This  new  technique  may  be  viewed  as  a  truncated  form  of  the  powerful  implicitly  shifted 
QR  technique  that  is  suitable  for  large-scale  problems.  Implicit  restarting  provides  a  means 
to  approximate  a  few  eigenvalues  with  user  specified  properties  in  space  proportional  to  nk , 
where  k  is  the  number  of  eigenvalues  sought,  and  n  is  the  problem  size. 

Generalized  eigenvalue  problems  are  discussed  in  some  detail.  They  arise  naturally  in 
PDE  applications  and  they  have  a  number  of  subtleties  with  respect  to  numerically  stable 
implementation  of  spectral  transformations. 

Software  issues  and  considerations  for  implementation  on  vector  and  parallel  computers 
are  introduced  in  the  later  sections.  Implicit  restarting  has  provided  a  means  to  develop 
very  robust  and  efficient  software  for  a  wide  variety  of  large-scale  eigenproblems.  A  public 
domain  software  package  called  ARPACK  has  been  developed  in  Fortran  77.  This  package 
has  performed  well  on  workstations,  parallel-vector  supercomputers,  distributed-memory 
parallel  systems  and  clusters  of  workstations.  The  features  of  this  package  along  with  some 
applications  and  performance  indicators  occupy  the  final  section  of  this  paper. 

2.  Eigenvalues,  Power  Iterations,  and  Spectral  Transformations 

A  brief  discussion  of  the  mathematical  structure  of  the  eigenvalue  problem  is  necessary 
to  fix  notation  and  introduce  ideas  that  lead  to  an  understanding  of  the  behavior,  strengths 
and  limitations  of  the  algorithms.  In  this  discussion,  the  real  and  complex  number  fields  are 
denoted  by  R  and  C,  respectively.  The  standard  n-dimensional  real  and  complex  vectors 
are  denoted  by  Rn  and  Cn  and  the  symbols  Rmxn  and  CmXn  will  denote  the  real  and 
complex  matrices  m  rows  and  n  columns.  Scalars  are  denoted  by  lower  case  Greek  letters, 
vectors  are  denoted  by  lower  case  Latin  letters  and  matrices  by  capital  Latin  letters.  The 
transpose  of  a  matrix  A  is  denoted  by  AT  and  the  conjugate-transpose  by  AH .  The  symbol, 
||  •  ||  will  denote  the  Euclidean  or  2-norm  of  a  vector.  The  standard  basis  of  Cn  is  denoted 
by  the  set 

The  set  of  numbers  <r(A)  =  {A  6  C  :  rank(A  -  XI)  <  n)}  is  called  the  spectrum  of  A . 
The  elements  of  this  discrete  set  are  the  eigenvalues  of  A  and  they  may  be  characterized  as 
the  n  roots  of  the  characteristic  polynomial  PaW  =  det(XI  —  A),  Corresponding  to  each 
distinct  eigenvalue  A  €  <J (A)  is  at  least  one  nonzero  vector  x  such  that  Ax  —  xX.  This  vector 
is  called  a  right  eigenvector  of  A  corresponding  to  the  eigenvalue  A.  The  pair  (x,  A)  is  called 
an  eigenpair.  A  nonzero  vector  y  such  that  yH A  =  A yH  is  called  a  left  eigenvector .  The 
multiplicity  na(A)  of  A  as  a  root  of  the  characteristic  polynomial  is  the  algebraic  multiplicity 
and  the  dimension  n^(A)  of  Null(XI  —  A)  is  the  geometric  multiplicity  of  A.  A  matrix  is 
defective  if  ng( A)  <  na(A)  and  otherwise  A  is  nondefective .  The  eigenvalue  A  is  simple  if 
na(A)  =  1. 
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A  subspace  S  of  Cnxn  is  called  an  invariant  subspace  of  A  if  AS  C  S.  It  is  straightfor¬ 
ward  to  show  if  A  €  CnXn  ,  X  £  Cnxfc  and  B  £  Ckxk  satisfy 


AX  =  XB,  (1) 

then  S  =  Range(X)  is  an  invariant  subspace  of  A.  Moreover,  if  X  has  full  column  rank 
k  then  the  columns  of  X  form  a  basis  for  this  subspace  and  a(B)  C  cr(A).  If  k  =  n  then 
a(B)  =  a(A )  and  A  is  said  to  be  similar  to  B  under  the  similarity  transformation  X. 
A  is  diagonalizable  if  it  is  similar  to  a  diagonal  matrix  and  this  property  is  equivalent  to 
A  being  nondefective. 

An  extremely  important  theorem  to  the  study  of  numerical  algorithms  for  eigenproblems 
is  the  Schur  decomposition.  It  states  that  every  square  matrix  is  unitarily  similar  to  an 
upper  triangular  matrix.  In  other  words,  for  any  linear  operator  on  Cn,  there  is  a  unitary 
basis  in  which  the  operator  has  an  upper  triangular  matrix  representation. 

Theorem  1  (Schur  Decomposition).  Let  A  £  Cnxn .  Then  there  is  a  unitary  matrix  Q  and 
an  upper  triangular  matrix  R  such  that 

AQ  =  QR.  (2) 

The  diagonal  elements  of  R  are  the  eigenvalues  of  A. 


From  the  Schur  decomposition,  the  fundamental  structure  of  Hermitian  and  normal  matrices 
is  easily  exposed: 

Lemma  2  A  matrix  A  £  Cnxn  is  normal  (  AAH  =  AH A  )  if  and  only  if  A  =  QAQH  with 
q  g  Qnxn  unnary  and  A  £  CnXn  diagonal.  A  matrix  A  £  CnXn  is  Hermitian  (  A  =  AH  )  if 
and  only  if  A  =  QAQH  with  Q  £  CnXn  unitary  and  A  £  RnX”  diagonal.  In  either  case,  the 
diagonal  entries  of  A  are  the  eigenvalues  of  A  and  the  columns  of  Q  are  the  corresponding 
eigenvectors. 

The  proof  follows  easily  through  substitution  of  the  Schur  decomposition  in  place  of  A  in 
each  of  the  defining  relationships.  The  columns  of  Q  are  called  Schur  vectors  in  general  and 
these  are  eigenvectors  of  A  if  and  only  if  A  is  normal. 

For  purposes  of  algorithmic  development  this  structure  is  fundamental.  In  fact,  the  well 
known  Implicitly  Shifted  QR-Algorithm  (Francis,  1961)  is  designed  to  produce  a  sequence 
of  unitary  similarity  transformations  Qj  that  iteratively  reduce  A  to  upper  triangular  form. 
This  algorithm  begins  with  an  initial  unitary  similarity  transformation  V  of  A  to  the  con¬ 
densed  form  AV  =  VH,  where  H  is  upper  Hessenberg  (tridiagonal  in  case  A  -  AH  ).  Then 
the  following  iteration  is  performed: 

where  Q  is  unitary  and  R  is  upper  triangular  (i.e.,  the  QR  factorization  of  H  -  pi  ).  It 
is  easy  to  see  that  H  is  unitarily  similar  to  A  throughout  the  course  of  this  iteration.  The 
iteration  is  continued  until  the  subdiagonal  elements  of  H  converge  to  zero,  i.e.  until  a 
Schur  decomposition  has  been  (approximately)  obtained.  In  the  standard  implicitly  shifted 
Q R-iteration,  the  unitary  matrix  Q  is  never  actually  formed,  it  is  computed  indirectly  as 
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Algorithm  1:  Implicitly  Shifted  QR- iteration 


Input:  (A,  V,  H  )  with  AV  =  VH,  VHV  =  I  ,  H  upper  Hessenberg; 
For  j  =  1,2,3, ...  until  convergence , 

(al.l)  Select  a  shift  n  nj 
(al.2)  Factor  [Q,  £]  =  qr(H  —  ^/)  ; 

(al.3)  H  <-  QaHQ  ;  V  -  VQ; 

End.For 


a  product  of  2  x  2  Givens  or  3  x  3  Householder  transformations  through  a  “bulge  chase” 
process.  The  elegant  details  of  an  efficient  and  stable  implementation  would  be  too  much 
of  a  digression  here.  They  may  be  found  in  (Golub  and  Van  Loan,  1983).  The  convergence 
behavior  of  this  iteration  is  fascinating.  The  columns  of  V  converge  to  Schur  vectors  at 
various  rates.  These  rates  are  fundamentally  linked  to  the  simple  power  method  and  its 
rapidly  convergent  variant,  inverse  iteration  (see  Watkins  and  Eisner,  1991). 

Despite  the  extremely  fast  rate  of  convergence  and  the  efficient  use  of  storage,  the 
implicitly  shifted  QR  method  is  not  suitable  for  large-scale  problems  and  it  has  proved  to  be 
extremely  difficult  to  parallelize.  Large-scale  problems  are  typically  sparse  or  structured  so 
that  a  matrix- vector  product  w  <-  Av  may  be  computed  with  time  and  storage  proportional 
to  n  rather  than  n2  .  A  method  based  upon  full  similarity  transformations  quickly  destroys 
this  structure.  Storage  and  operation  counts  become  order  n2.  Hence,  there  is  considerable 
motivation  for  methods  that  only  require  matrix- vector  products  with  the  original  A. 


2.1.  Single  vector  power  iterations 

Probably  the  oldest  algorithm  for  approximating  eigenvalues  and  corresponding  eigen¬ 
vectors  of  a  matrix  is  the  power  method.  This  method  is  an  important  tool  in  its  own  right 
when  conditions  are  appropriate.  It  is  very  simple  and  only  requires  matrix-vector  products 
along  with  two  vectors  of  storage.  In  addition  to  its  role  as  an  algorithm,  the  method  is 
central  to  the  development,  understanding,  and  convergence  analysis  of  all  of  the  iterative 
methods  discussed  here. 

Algorithm  2:  The  Power  Method 


Input:  (A,  v0  ) 

Put  v  =  »o/H»o||oo; 

For  j  —  1,2,  3, ...  until  convergence, 
(a2.1)  w  <—  Av; 

(*2.2)  A  =  fwf; 

(a2.3)  i  =  i_max  ( w ); 

(a2.4)  v  <—  v/(ejw)  ; 

End.For 
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At  Step  (a2.3),  i  is  the  index  of  the  element  of  w  with  largest  absolute  value.  It  is  easily 
seen  that  the  contents  of  v  after  &-steps  of  this  iteration  will  be  the  vector 


Vk  ~  iejA>'v0)AkVo  ~  {ej%v0){pkAkVo) 


for  any  nonzero  scalar  pk.  In  particular,  this  iteration  may  be  analyzed  as  if  the  vectors  had 
been  scaled  by  pk  =  A*  at  each  step,  with  Ai  an  eigenvalue  of  A  with  largest  magnitude. 
If  A  is  diagonalizable  with  eigenpairs  {(%j,  \j),  1  <  j  <  n)  and  v0  has  the  expansion 
v0  =  Y,]=t.  xjlj  in  this  basis  then 


-tfAkv0  =  X  Akxm  =  X  Zj(VAi)fc7j-  (3) 

A -i  A- 1  - 

1  1  J—l  J-l 


If  A  i  is  a  simple  eigenvalue  then 

It  follows  that  vk  — ►  x\ /(efxi),  where  i  =  i_max  (xj),  at  a  linear  rate  with  a  convergence 
factor  of  |  . 

While  the  power  method  is  useful,  it  has  two  obvious  drawbacks.  Convergence  may  be 
arbitrarily  slow  or  may  not  happen  at  all.  Only  one  eigenvalue  and  corresponding  vector 
can  be  found. 


2.2.  Spectral  transformations 

The  basic  power  iteration  may  be  modified  to  overcome  these  difficulties.  The  most  fun¬ 
damental  modification  is  to  employ  a  spectral  transformation.  Spectral  transformations 
are  generally  based  upon  the  following: 

Let  A  €  CnXn  have  an  eigenvalue  A  with  corresponding  eigenvector  x. 

1.  Let  p(r )  =  7o  +  7i r  +  72 r 2  +  . . .  +  7jt,ri.  Then  p( A)  is  an  eigenvalue  of  the  matrix 
p(A)  =  70/  +  71 A  +  72A2  +  . .  .  +  7 kAk  with  corresponding  eigenvector  x  (i.e.  p(A)x  = 

xp( A)  )• 

2.  If  r(r)  =  |^| ,  where  p  and  q  are  polynomials  with  q(A)  nonsingular,  define  r(A)  = 
[9(A)]-1p(A).  Then  r(A)  is  an  eigenvalue  of  r(A)  with  corresponding  eigenvector  x. 

It  is  often  possible  to  construct  a  polynomial  or  rational  function  <f>(r)  such  that 

|<KA*)I  <  l^(Ai)l  for  1  <  J  <  n,  J  7^  h 

where  A ;  is  an  eigenvalue  of  particular  interest.  This  is  called  a  spectral  transformation  since 
the  eigenvectors  of  the  transformed  matrix  <f>(A)  remain  the  same,  but  the  corresponding 
eigenvalues  A  j  are  transformed  to  Applying  the  power  method  with  <fi(A)  in  place 

of  A  will  then  produce  the  eigenvector  q  =  X{  corresponding  to  A;  at  a  linear  convergence 
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rate  with  a  convergence  factor  of  <C  1.  Once  the  eigenvector  has  been  found,  the 

eigenvalue  A  =  A,  may  be  calculated  directly  from  a  Rayleigh  Quotient  A  =  qH Aq/qHq. 

2.3.  Inverse  iteration 

Spectral  transformation  can  lead  to  dramatic  enhancement  of  the  convergence  of  the 
power  method.  Polynomial  transformations  may  be  applied  using  only  matrix- vector  prod¬ 
ucts.  Rational  transformations  require  the  solution  of  linear  systems  with  the  transformed 
matrix  as  the  coefficient  matrix.  The  simplest  rational  transformation  turns  out  to  be 
very  powerful  and  is  almost  exclusively  used  for  this  purpose.  If  \i  £  cf{A)  then  A  -  fil  is 
invertible  and  cr([A  -  =  { 1  /(A  -  jtt)  :  A  €  <j(A )}  .  This  transformation  is  very  suc¬ 

cessful  since  eigenvalues  near  the  shift  //  are  transformed  to  extremal  eigenvalues  which  are 
well  separated  from  the  other  ones  while  the  original  extremal  eigenvalues  are  transformed 
near  the  origin.  Hence  under  this  transformation  the  eigenvector  q  corresponding  to  the 
eigenvalue  of  A  that  is  closest  to  fi  may  be  readily  found  and  the  corresponding  eigenvalue 
may  obtained  either  through  the  formula  A  =  0  +  l//x,  where  6  is  the  eigenvalue  of  the 
transformed  matrix,  or  it  may  be  calculated  directly  from  a  Rayleigh  quotient. 

Algorithm  3:  The  Inverse  Power  Method 


Input:  ( A ,  v0,  y  ) 

Put  V  —  t>o/||Vo||oo; 

For  j  =  1,2,3, ...  until  convergence , 

(a3.1)  Solve  (A  —  fil)w  =  v\ 

(a3.2)A  =  „  +  ^; 

(a3.3)  i  =  i-max  (w); 

(a3.4)  v  v/(ef  w)  ; 

End.For 

Observe  that  the  formula  for  A  at  Step  (a3.2)  is  equivalent  to  forming  A  =  (wH  Aw) / (wH w) 
so  an  additional  matrix  vector  product  is  not  necessary  to  obtain  the  Rayleigh  quotient  esti¬ 
mate.  The  analysis  of  convergence  remains  entirely  in  tact.  This  iteration  converges  linearly 
with  the  convergence  factor 

[Ai  -Ml 
|A2-/*|’ 

where  the  eigenvalues  of  A  have  been  re-indexed  so  that  |Aj  -  n\  <  |A2  —  n\  <  |A3  —  /x|  < 
...  <  | An  —  fi\.  Hence,  the  convergence  becomes  faster  as  fJ.  gets  closer  to  A,. 

This  result  is  encouraging  but  still  leaves  us  wondering  how  to  select  the  shift  //  to  be 
close  to  the  unknown  eigenvalue  we  are  trying  to  compute.  In  many  applications  the  choice 
is  apparent  from  the  requirements  of  the  problem.  It  is  also  possible  to  change  the  shift 
at  each  iteration  at  the  expense  of  a  new  matrix  factorization  at  each  step.  An  obvious 
choice  would  be  to  replace  the  shift  with  the  current  Rayleigh  quotient  estimate.  This 
method,  called  Rayleigh  Quotient  (RQ)  iteration,  has  very  impressive  convergence  rates 
indeed.  Rayleigh  Quotient  Iteration  converges  at  a  quadratic  rate  in  general  and  at  a  cubic 
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rate  on  Hermitian  problems.  For  a  more  detailed  discussion  of  the  eigenvalue  problem  and 
basic  algorithms  see  (Golub  and  Van  Loan,  1983,  Stewart,  1973,  and  Wilkinson,  1965). 

3.  Krylov  Subspaces  and  Projection  Methods 

Although  the  rate  of  convergence  can  be  improved  to  an  acceptable  level  through  spec¬ 
tral  transformations,  power  iterations  are  only  able  to  find  one  eigenvector  at  a  time.  If 
more  vectors  are  sought,  then  various  deflation  techniques  (such  as  orthogonalizing  against 
previously  converged  eigenvectors)  and  shift  strategies  must  be  introduced.  One  alternative 
is  to  introduce  a  block  form  of  the  simple  power  method  which  is  often  called  subspace  iter¬ 
ation.  This  important  class  of  algorithms  has  been  developed  and  investigated  in  (Stewart, 
1973).  Several  software  efforts  have  been  based  upon  this  approach  (Bai  and  Stewart,  1992, 
Duff  and  Scott,  1993,  and  Stewart  and  Jennings,  1992).  However,  there  is  another  class 
of  algorithms  called  Krylov  subspace  projection  methods  that  are  based  upon  the  intricate 
structure  of  the  sequence  of  vectors  naturally  produced  by  the  power  method. 

An  examination  of  the  behavior  of  the  power  sequence  as  exposed  in  equation  (3)  hints 
that  the  successive  vectors  produced  by  a  power  iteration  may  contain  considerable  infor¬ 
mation  along  eigenvector  directions  corresponding  to  eigenvalues  other  than  the  one  with 
largest  magnitude.  The  expansion  coefficients  of  the  vectors  in  the  power  sequence  evolve 
in  a  very  structured  way.  Therefore,  linear  combinations  of  the  these  vectors  might  well  be 
devised  to  expose  additional  eigenvectors.  A  single  vector  power  iteration  simply  ignores 
this  additional  information,  but  more  sophisticated  techniques  may  be  employed  to  extract 
it. 

If  one  hopes  to  obtain  additional  information  through  various  linear  combinations  of  the 
power  sequence,  it  is  natural  to  formally  consider  the  Krylov  subspace 

ICk(A,v r)  =  Span  A2v1,. . ., 


and  to  attempt  to  formulate  the  best  possible  approximations  to  eigenvectors  from  this 
subspace. 

It  is  reasonable  to  construct  approximate  eigenpairs  from  this  subspace  by  imposing  a 
Galerkin  condition:  A  vector  x  G  Kk{A,v i)  is  called  a  Ritz  vector  with  corresponding  Ritz 
value  0  if  the  Galerkin  condition 

<  w,  Ax  -  xO  >=  0  ,  for  all  w  G  K.k{A,v i) 

is  satisfied.  There  are  some  immediate  consequences  of  this  definition:  Let  W  be  a  matrix 
whose  columns  form  an  orthonormal  basis  for  ICk  =  ICk(A,vi).  Let  V  =  WWH  denote  the 
related  orthogonal  projector  onto  Kk  and  define  A  =  VAV  =  WBWH ,  where  B  =  WH  AW. 
It  can  be  shown  that 

Lemma  3  For  the  quantities  defined  above: 

1.  (x,0)  is  a  Ritz-pair  if  and  only  if  x  =  Wy  with  By  =  yd  . 

2.  ||(I  -  V)AW\\  =  ||( A  -  A)W\\  <  || (A  -  M)W\\ 
for  all  M  G  CnXn  such  that  MKk  C  K,k. 
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3 .  The  Ritz-pairs  (x,0)  and  the  minimum  value  of\\(I  -  V)AW\\  are  independent  of  the 
choice  of  orthonormal  basis  W . 

Item  (1)  follows  immediately  from  the  Galerkin  condition  since  it  implies  that  0  =  WH(AWy — 
Wy0 )  =  By  —  y0.  Item  (2)  is  easily  shown  using  invariance  of  ||  •  ||  under  unitary  transfor¬ 
mations.  Item  (3)  follows  from  the  fact  that  V  is  an  orthonormal  basis  for  1C k  if  and  only  if 
V  —  WQ  for  some  k  x  k  unitary  matrix  Q.  With  this  change  of  basis  A  —  VHVh ,  where 
H  =  VH AV  —  QH BQ.  Since  H  is  unitarily  similar  to  £?,  the  Ritz-values  remain  the  same 
and  the  Ritz- vectors  are  of  the  form  x  =  Wy  —  Vy,  where  y  =  QHy. 

These  facts  are  actually  valid  for  any  k  dimensional  subspace  $  in  place  of  /C^.  The 
following  properties  are  consequences  of  the  fact  that  every  w  6  ICk  is  of  the  form  w  — 
<j)(A)v i  for  some  polynomial  4>  of  degree  less  than  k. 

Lemma  4  For  the  quantities  defined  above: 

1.  If  q  is  a  polynomial  of  degree  less  than  k  then 

=  g(i)t7i  =  Wq(B)zi, 

where  v\  =  Wz i,  and  if  the  degree  of  q  is  k  then 

Vq(A)v  i  =  q(A)vi. 

2.  If  p(\)  =  det(\I  —  B)  is  the  characteristic  polynomial  of  B  then  p(A)  =  0  and 
||p(j4)?;i||  <  ||g(A)ui||  for  all  monic  polynomials  of  degree  k. 

3.  If  y  is  any  vector  in  Ck  then  AWy  -  WBy  =  'jp(A)vi  for  some  scalar  7. 

4-  If  (x,0)  is  any  Ritz-pair  for  A  with  respect  to  ICk  then 

Ax  -  xO  =  ■yp(A)vi 


for  some  scalar  7. 

This  discussion  follows  the  treatment  given  by  Saad  in  (Saad,  1992)  and  in  his  earlier 
papers.  While  these  facts  may  seem  esoteric,  they  have  important  algorithmic  consequences. 
First,  it  should  be  noted  that  ICk  is  an  invariant  subspace  for  A  if  and  only  if  iq  =  Vy. 
where  AV  =  VR  with  VHV  =  Ik  and  R  is  k  x  k  upper  triangular.  Also,  ICy  is  an  invariant 
subspace  for  A  if  vi  =  Xy,  where  X  6  Cnxk  and  AX  =  XA  with  A  diagonal.  This  follows 
from  items  (2)  and  (3)  since  there  is  a  fc- degree  monic  polynomial  q  such  that  q(R)  =  0  and 
hence  ||p(A)ni||  <  ||g(A)wi||  =  ||Fg(J?)y||  =  0.  (A  similar  argument  holds  when  rq  =  Xy). 
Secondly,  there  is  some  algorithmic  motivation  to  seek  a  convenient  orthonormal  basis 

V  =  WQ  that  will  provide  a  means  to  successively  construct  these  basis  vectors.  It  is 
possible  to  construct  a  kxk  unitary  Q  using  standard  Householder  transformations  such  that 

VI  =  V e\  and  H  =  QH BQ  is  upper  Hessenberg  with  non-negative  subdiagonal  elements. 
It  is  also  possible  to  show  using  item  (3)  that  in  this  basis, 

AV  =  VH  +  fe'k,  where  /  =  7p(A)iq 
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and  VH  f  =  0  follows  from  the  Galerkin  condition. 

The  first  observation  shows  that  if  it  is  possible  to  obtain  a  v\  as  a  linear  combination 
of  k  eigenvectors  of  A  then  /  =  0  and  V  is  an  orthonormal  basis  for  an  invariant  subspace 
of  A,  and  that  the  Ritz  values  a(H )  C  <7  (A)  and  corresponding  Ritz  vectors  are  eigenpairs 
for  A.  The  second  observation  leads  to  the  Lanczos/ Arnoldi  process  (Arnoldi,  1951  and 
Lanczos,  1950). 

4.  The  Arnoldi  Factorization 

j Definition  :  If  A  €  C”X71  then  a  relation  of  the  form 

AVk  =  VkHk  +  fkeTk, 

where  Vk  e  Cnxfc  has  orthonormal  columns,  fk  =  0  and  Hk  6  Ckxk  is  upper  Hessenberg 
with  non- negative  subdiagonal  elements  is  called  a  k-step  Arnoldi  Factorization  of  A.  If  A 
is  Hermitian  then  Hk  is  real,  symmetric  and  tridiagonal  and  the  relation  is  called  a  k-step 
Lanczos  Factorization  of  A.  The  columns  of  Vk  are  referred  to  as  the  Arnoldi  vectors  or 
Lanczos  vectors  respectively. 

The  development  of  this  factorization  has  been  purely  through  the  consequences  of  the 
orthogonal  projection  imposed  by  the  Galerkin  conditions.  A  more  straightforward  but  less 
illuminating  derivation  is  to  simply  truncate  the  reduction  of  A  to  Hessenberg  form  that 
precedes  the  implicitly  shifted  QR-iteration  by  equating  the  first  k  columns  on  both  sides 
of  the  complete  reduction  AV  =  VH.  An  alternative  way  to  write  this  factorization  is 

AVk  =  (Vk,vk+1)  ^  j  where  f3k  =  \\fk\\  and  vk+1  =  j-fk  . 

This  factorization  may  be  used  to  obtain  approximate  solutions  to  a  linear  system  Ax  =  b 
if  b  =  V\f30.  The  purpose  here  is  to  investigate  the  use  of  this  factorization  to  obtain  ap¬ 
proximate  eigenvalues  and  eigenvectors.  The  discussion  of  the  previous  section  implies  that 
Ritz  pairs  satisfying  the  Galerkin  condition  are  immediately  available  from  the  eigenpairs 
of  the  small  projected  matrix  H . 

If  Hky  —  yd  then  the  vector  x  —  Vky  satisfies 

\\Ax  -  x0\\  =  \\(AVk  -  VkHk)y\\  =  \(3keTky\. 

The  number  \0kejy\  is  called  the  Ritz  estimate  for  this  the  Ritz  pair  (x,0)  as  an  approxi¬ 
mate  eigenpair  for  A.  Observe  that  if  (x,  6)  is  a  Ritz  pair  then 

0  =  yH  Hky  =  (Vky)HA(Vky)  =  xH  Ax 

is  a  Rayleigh  Quotient  (assuming  [|y||  =  1)  and  the  associated  Rayleigh  Quotient  residual 
r(x)  =  Ax  —  x0  satisfies 

IK*)II  =  \Pkeky\. 

When  A  is  Hermitian,  this  relation  may  be  used  to  provide  computable  rigorous  bounds  on 
the  accuracy  of  the  eigenvalues  of  H  as  approximations  to  eigenvalues  of  A;  see  (Parlett, 
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1980).  When  A  is  non-Hermitian  the  possibility  of  non-normality  precludes  such  bounds 
and  one  can  only  say  that  the  RQ-residual  is  small  if  \/3keJy\  is  small.  However,  in  either 
case,  if  /*.  =  0  these  the  Ritz  pairs  become  exact  eigenpairs  of  A . 

This  factorization  may  be  advanced  one  step  at  the  cost  of  a  (sparse)  matrix-vector 
product  involving  A  and  two  dense  matrix  vector  products  involving  V fcT  and  V*.  The 
explicit  steps  needed  to  form  a  &-Step  Arnoldi  factorization  are  shown  in  Algorithm  4. 

Algorithm  4:  The  fc-Step  Arnoldi  Factorization 


Input:  (A,  u) 

Put  Vi  =  v/||v||;  w  ~  Av\;ot\  —  vf  w; 

Put  /1  4-  w  -  v\ai  ;  V  *-  (vi);  H  —  (ori); 


For  j  =  1,  2,  3, ... k , 


(a4.1)  fij  =  II/, ||;  vj+i  -  /,//?,; 


(a4.2)  Vj+i  *  {VjyVj+ i)j  H j  *■ 


(a4.3)  2  Avj+i; 

(a4.4)  h  *-  Vf  z\  fj+i  *-z-  Vj+1h ; 
(a.4.5)  HJ+ 1  «-  (H3,  h); 

End.For 


In  exact  arithmetic,  the  columns  of  V  form  an  orthonormal  basis  for  the  Krylov  subspace 
and  H  is  the  orthogonal  projection  of  A  onto  this  space.  In  finite  precision  arithmetic,  care 
must  be  taken  to  assure  that  the  computed  vectors  are  orthogonal  to  working  precision. 
The  method  proposed  by  Daniel,  Gragg,  Kaufman  and  Stewart  (DGKS)  in  (Daniel  et  ah, 
1976)  provides  an  excellent  way  to  construct  a  vector  fj+ *  that  is  numerically  orthogonal 
to  Vj+i .  It  amounts  to  computing  a  correction 

s  I) *  1  f j + 1  ?  /j+i  ‘  fj+ 1  h  *  h  s * 

just  after  Step  (a4.4)  if  necessary.  A  simple  test  can  be  devised  to  avoid  this  DGKS  correc¬ 
tion  if  it  is  not  needed. 

The  dense  matrix-vector  products  at  Step  (a4.4)  and  also  the  correction  may  be  ac¬ 
complished  using  Level  2  BLAS.  This  is  quite  important  for  performance  on  vector,  and 
parallel-vector  supercomputers.  The  BLAS  operation  _GEMV  is  easily  parallelized  and 
vectorized  and  has  a  much  better  ratio  of  floating  point  computation  to  data  movement 
(Dongarra  et  al.,  1988  and  Dongarra  et  al.,  1991).  The  Modified  Gram-Schmidt  Process 
(MGS)  is  often  used  in  the  construction  of  Arnoldi  factorizations.  However,  MGS  will  defi¬ 
nitely  not  produce  numerically  orthogonal  basis  vectors  in  practice.  Moreover,  MGS  cannot 
be  formulated  in  terms  of  Level  2  BLAS  unless  all  of  the  vectors  to  be  orthogonalized  are 
known  in  advance  and  this  is  not  the  case  in  the  Arnoldi  process.  For  these  reasons,  classical 
Gram-Schmidt  orthogonalization  with  the  DGKS  correction  step  is  highly  recommended. 

The  information  obtained  through  this  process  is  completely  determined  by  the  choice 
of  the  starting  vector.  Eigen-information  of  interest  may  not  appear  until  k  gets  very  large. 
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In  this  case  it  becomes  intractable  to  maintain  numerical  orthogonality  of  the  basis  vectors 
Vk.  Moreover,  extensive  storage  will  be  required  and  repeatedly  finding  the  eigensystem  of 
H  will  become  prohibitive  at  a  cost  of  0(k3)  flops. 

Failure  to  maintain  orthogonality  leads  to  several  numerical  difficulties.  In  a  certain 
sense,  the  computation  (or  approximation)  of  the  projection  indicated  at  Step  (a4.4)  in  a 
way.  that  overcomes  these  difficulties  has  been  the  main  source  of  research  activity  in  these 
Krylov  subspace  projection  methods.  The  computational  difficulty  stems  from  the  fact  that 
||/fc[|  =  0  if  and  only  if  the  columns  of  14  span  an  invariant  subspace  of  A.  When  Vk  “nearly” 
spans  such  a  subspace  ||/4||  will  be  small.  Typically,  in  this  situation,  a  loss  of  significant 
digits  will  take  place  at  Step  (a4.4)  through  numerical  cancellation  unless  special  care  is 
taken  (i.e.,  use  of  the  DGKS  correction). 

It  is  desirable  for  ||/fc||  to  become  small  because  this  indicates  that  the  eigenvalues  of 
H  are  accurate  approximations  to  the  eigenvalues  of  A.  However,  this  “convergence”  will 
indicate  a  probable  loss  of  numerical  orthogonality  in  V.  Moreover,  if  subsequent  Arnoldi 
vectors  are  not  forced  to  be  orthogonal  to  the  converged  ones  then  components  along  these 
directions  re-enter  the  basis  via  round-off  effects  and  quickly  cause  a  spurious  copy  of  the 
previously  computed  eigenvalue  to  appear  repeatedly  in  the  spectrum  of  the  projected 
matrix  H.  The  identification  of  this  phenomenon  in  the  symmetric  case  and  the  first  rigorous 
numerical  treatment  is  due  to  Paige  (1971).  There  have  been  several  approaches  to  overcome 
this  problem  in  the  symmetric  case.  They  include:  (1)  complete  re-orthogonalization,  which 
may  be  accomplished  through  maintaining  V  in  product  Householder  form  (Walker,  1988) 
or  through  the  Modified  Gram-Schmidt  processes  with  re-orthogonalization  (Daniel  et  al., 
1976).  (2)  Selective  re-orthogonalization,  which  has  been  proposed  by  Parlett  and  has  been 
heavily  researched  by  him  and  his  students.  Most  notably,  the  theses  and  subsequent  papers 
and  computer  codes  of  Scott  and  of  Simon  have  developed  this  idea  (Parlett  and  Scott,  1979, 
Parlett,  1980,  and  Simon,  1984).  (3)  No  re-orthogonalization,  which  has  been  developed 
by  Cullum  and  her  colleagues.  This  last  option  introduces  the  almost  certain  possibility 
of  introducing  spurious  eigenvalues.  Various  techniques  have  been  developed  to  detect  and 
deal  with  the  presence  of  spurious  eigenvalues  (Cullum,  1978  and  Cullum  and  Willoughby, 
1981). 

The  appearance  of  spurious  eigenvalues  may  be  avoided  through  complete  orthogonal- 
ization  of  the  Arnoldi  (or  Lanczos)  vectors  using  the  DGKS  correction.  Computational  cost 
has  been  cited  as  the  reason  for  not  employing  this  option.  However,  the  cost  will  be  rea¬ 
sonable  if  one  is  able  to  fix  k  at  a  modest  size  and  then  update  the  starting  vector  v\  =  Vke\ 
while  repeatedly  doing  ft- Arnoldi  steps.  This  approach  was  introduced  in  (Karush,  1951) 
and  developed  further  by  (Cullum  and  Donath,  1974)  for  the  symmetric  case.  Saad  (1980, 
1984,  and  1992)  has  developed  explicit  restarting  for  the  nonsymmetric  case.  Restarting  has 
proven  to  have  important  consequences  for  the  development  of  numerical  software  based 
upon  Arnoldi’s  method  and  this  will  be  explored  in  the  following  section. 

5.  Restarting  the  Arnoldi  Method 

An  unfortunate  aspect  of  the  Lanczos/ Arnoldi  process  is  that  one  cannot  know  in  ad¬ 
vance  how  many  steps  will  be  required  before  eigenvalues  of  interest  are  well  approximated 
by  Ritz  values.  This  is  particularly  true  when  the  problem  has  a  wide  range  of  eigenvalues 
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but  the  eigenvalues  of  interest  are  clustered.  For  example,  in  computational  chemistry, 
problems  are  usually  symmetric  and  positive  definite  and  there  is  a  wide  range  of  eigenval¬ 
ues  varying  over  many  orders  of  magnitude.  Only  the  smallest  eigenvalues  are  physically 
interesting  and  they  are  typically  clustered  at  the  low  end  of  the  spectrum.  Shift  and  invert 
is  usually  not  an  option  because  of  fill  in  from  the  factorizations.  Without  a  spectral  trans¬ 
formation,  many  Lanczos  steps  are  required  to  obtain  the  smallest  eigenvalues.  In  order  to 
recover  eigenvectors,  one  is  obliged  to  store  all  of  the  Lanczos  basis  vectors  (usually  on  a 
peripheral  device)  and  to  solve  very  large  tri diagonal  eigenvalue  subproblems  at  each  step. 
In  the  Arnoldi  process  that  is  used  in  the  non-Hermitian  case,  not  only  do  the  basis  vectors 
have  to  be  stored,  but  the  cost  of  the  Hessenberg  eigenvalue  subproblem  is  0(k3)  at  the 
&-th  step. 

5.1.  Explicit  restarting 

An  alternative  has  been  proposed  by  Saad  based  upon  the  polynomial  acceleration 
scheme  developed  in  (ManteufFel,  1978)  for  the  iterative  solution  of  linear  systems.  Saad 
(1984)  proposed  to  restart  the  iteration  with  a  vector  that  has  been  preconditioned  so  that 
it  is  more  nearly  in  a  /.'-dimensional  invariant  subspace  of  interest.  This  preconditioning 
takes  the  form  of  a  polynomial  applied  to  the  starting  vector  that  is  constructed  to  damp 
unwanted  components  from  the  eigenvector  expansion.  The  resulting  algorithm  takes  the 
form: 

Algorithm  5:  An  Explicitly  Restarted  Arnoldi  Method 

Input:  (A,  v) 

Put  v !  =  v/|M|; 

For  j  =  1,  2,3, ...  until  convergence 

(a5.1)  Compute  an  m-step  Arnoldi  factorization 
AVm  =  VmHm  +  fmem  with  Vme i  =  v\  ; 

(a5.2)  Compute  a(Hm)  and  corresponding  Ritz  estimates 

and  halt  if  desired  eigenvalues  are  well  approximated. 

(a5.3)  Construct  a  polynomial  ip  based  upon  cr{Hm) 
to  damp  unwanted  components. 

(a5.4)  vi  —  ip(A)v i;  Vi  <-  t>i/||vi||  ; 

End-For 

The  construction  of  the  polynomial  at  Step  (a5.3)  may  be  guided  by  a  priori  information 
about  the  spectrum  of  A  or  solely  by  information  gleaned  from  a(Hm).  A  typical  scheme  is 
to  sort  the  spectrum  of  Hm  into  two  disjoint  sets  flw  and  Qu,  with  cr(Hm )  =  £lw  U  Clu.  The 
Ritz  values  in  the  set  QUI  are  to  be  regarded  as  approximations  to  the  “wanted”  eigenvalues 
of  A  and  an  open  convex  set  Cu  containing  with  Slwr\Cu  =  0  is  to  be  regarded  as  a  region 
that  approximately  encloses  the  “unwanted”  portion  of  the  spectrum  of  A.  The  polynomial 
ib  is  then  constructed  to  be  as  small  in  magnitude  as  possible  on  Cu  when  normalized,  for 
example,  to  take  the  value  1  at  an  element  of  £lw  closest  to  dCu.  Chebyshev  polynomials  are 
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appropriate  when  Cu  is  taken  to  be  an  ellipse  and  this  was  the  original  proposal  of  Saad  when 
he  adapted  the  Manteuffel  idea  to  eigenvalue  calculations.  Another  possibility  explored  by 
Saad  has  been  to  take  Cu  to  be  the  convex  hull  of  f lu  and  to  construct  the  polynomial  ip  that 
best  approximates  0  on  this  set  in  the  least  squares  sense.  Both  of  these  are  based  upon 
well-known  theory  of  polynomial  approximation.  The  problem  of  constructing  an  optimal 
ellipse  for  this  problem  has  been  studied  by  Chatelin  and  Ho.  The  reader  is  referred  to 
(Chatelin  and  Ho,  1990)  for  details  of  constructing  these  polynomials. 

The  reasoning  behind  this  type  of  algorithm  is  that  that  if  is  a  linear  combination  of 
precisely  k  eigenvectors  of  A  then  Arnoldi  factorization  terminates  in  k  steps  (i.e.,  fk  =  0). 
The  columns  of  Vk  will  form  an  orthonormal  basis  for  the  invariant  subspace  spanned  by 
those  eigenvectors,  and  the  Ritz  values  <r(Hk)  will  be  the  corresponding  eigenvalues  of  A. 
The  update  of  the  starting  vector  v\  is  designed  to  enhance  the  components  of  this  vector 
in  the  directions  of  the  wanted  eigenvectors  and  damp  its  components  in  the  unwanted 
directions.  This  effect  is  achieved  at  Step  (a5.4)  since 

71  U 
n  =  =*■  ^(A)vi  =  Y,xMxj)v- 

j= i 

If  the  same  polynomial  were  applied  each  time,  then  after  M  iterations,  the  j- th  original 
expansion  coefficient  would  be  essentially  attenuated  by  a  factor 

W (h)J  ' 

where  the  eigenvalues  have  been  ordered  according  decreasing  values  |V»(Aj))|.  The  eigen¬ 
values  inside  the  region  Cu  become  less  and  less  significant  as  the  iteration  proceeds.  Hence, 
the  wanted  eigenvalues  are  approximated  increasingly  well  as  the  iteration  proceeds. 

Another  restarting  strategy  proposed  by  Saad  is  to  replace  the  starting  vector  with  a 
linear  combination  of  Ritz  vectors  corresponding  to  wanted  Ritz  values.  If  the  eigenvalues 
and  corresponding  vectors  are  re-indexed  so  that  the  first  k  are  wanted  and  ( Xj,6j )  is  the 
the  Ritz  pair  approximating  the  eigenpair  (xj,Xj)  then 

k 

j  (4) 

j= i 

is  taken  as  the  new  starting  vector.  Again,  the  motivation  here  is  that  the  Arnoldi  residual 
fk  would  vanish  if  these  k  Ritz  vectors  were  actually  eigenvectors  of  A  and  the  Ritz  vectors 
are  the  best  available  approximations  to  these  eigenvectors.  A  heuristic  choice  for  the 
coefficients  7j  has  also  been  suggested  by  Saad  (1980).  It  is  to  weight  the  j-th  Ritz  vector 
with  the  value  of  its  Ritz  estimate  and  then  normalize  so  that  the  new  starting  vector 
has  norm  1.  This  has  the  effect  of  favoring  the  Ritz  vectors  that  have  least  converged. 
Additional  aspects  of  explicit  restarting  are  developed  thoroughly  in  Chapter  VII  of  (Saad, 
1992).  In  any  case,  this  restarting  mechanism  is  actually  polynomial  restarting  in  disguise. 
Since  xj  €  ICm(A,v i)  implies  Xj  =  <pj{A)v i  for  some  polynomial  <pj  the  formula  for  vf  in 
(4)  is  of  the  form 

k 

*1  «-  (p(A)vi  =  Y^lj<t>AA)v-L-  (5) 

3= 1 
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The  technique  just  described  is  referred  to  as  explicit  (polynomial)  restarting.  When 
Chebyshev  polynomials  are  used  it  is  called  an  Arnoldi- Chebyshev  method.  The  cost  in 
terms  of  matrix- vector  products  w  Av  is  M  *  (m  +  deg(ifi))  for  M  major  iterations.  The 
cost  of  the  arithmetic  in  the  Arnoldi  factorization  is  M  *  (2 n  *  m2  +  0(m3))  Flops  (floating 
point  operations).  Tradeoffs  must  be  made  in  terms  of  cost  of  the  Arnoldi  factorization  vs. 
cost  of  the  matrix-vector  products  Av  and  also  in  terms  of  storage  (nm  +  0(m2)). 


5.2.  Implicit  restarting 

There  is  another  approach  to  restarting  that  offers  a  more  efficient  and  numerically 
stable  formulation.  This  approach,  called  implicit  restarting ,  is  a  technique  for  combining 
the  implicitly  shifted  QR  mechanism  with  a  k-step  Arnoldi  or  Lanczos  factorization  to 
obtain  a  truncated  form  of  the  implicitly  shifted  QR-iteration.  The  numerical  difficulties 
and  storage  problems  normally  associated  with  Arnoldi  and  Lanczos  processes  are  avoided. 
The  algorithm  is  capable  of  computing  a  few  (fc)  eigenvalues  with  user  specified  features  such 
as  largest  real  part  or  largest  magnitude  using  2nk  +  0(k2)  storage.  No  auxiliary  storage 
is  required.  The  computed  Schur  basis  vectors  for  the  desired  fc-dimensional  eigenspace  are 
numerically  orthogonal  to  working  precision.  This  method  is  well  suited  to  the  development 
of  mathematical  software  and  this  will  be  discussed  in  Section  7. 

Implicit  restarting  provides  a  means  to  extract  interesting  information  from  very  large 
Krylov  subspaces  while  avoiding  the  storage  and  numerical  difficulties  associated  with  the 
standard  approach.  It  does  this  by  continually  compressing  the  interesting  information  into 
a  fixed  size  /^-dimensional  subspace.  This  is  accomplished  through  the  implicitly  shifted  QR 
mechanism.  An  Arnoldi  factorization  of  length  m  =  k  +  p 

AVm  =  VmHm  +  fmemi  (6) 

is  compressed  to  a  factorization  of  length  k  that  retains  the  eigen-information  of  interest. 
This  is  accomplished  using  QR  steps  to  apply  p  shifts  implicitly.  The  first  stage  of  this  shift 
process  results  in 

AV*  =  V*H*  +  fmelQ,  (7) 

where  V+  -  VmQ ,  H+  =  QT HmQ,  and  Q  =  Q1Q2  •  •  with  Qj  the  orthogonal  matrix 
associated  with  the  shift  fij.  It  may  be  shown  that  the  first  k  —  1  entries  of  the  vector  e^Q 
are  zero  (i.e.  e^Q  =  (aef,gr)  ).  Equating  the  first  k  columns  on  both  sides  yields  an 
updated  k— step  Arnoldi  factorization 

AV?  =  V?H?  +  f?  cl,  (8) 

with  an  updated  residual  of  the  form  =  Vj^_pek+i Pk  +  fk+pV*  Using  this  as  a  starting 
point  it  is  possible  to  apply  p  additional  steps  of  the  Arnoldi  process  to  return  to  the  original 
m-step  form. 

Each  of  these  shift  cycles  results  in  the  implicit  application  of  a  polynomial  in  A  of 
degree  p  to  the  starting  vector. 

v 

v\  ip(A)v i  with  ip(\)  =  ri(A  -  A*i)- 

l 
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The  roots  of  this  polynomial  are  the  shifts  used  in  the  QR  process  and  these  may  be 
selected  to  filter  unwanted  information  from  the  starting  vector  and  hence  from  the  Arnoldi 
factorization.  Full  details  may  be  found  in  (Sorensen,  1992).  The  basic  iteration  is  given 
here  in  Algorithm  6  and  the  diagrams  in  Figures  1-3  describe  how  this  iteration  proceeds 
schematically.  In  Algorithm  6  and  in  the  discussion  below,  the  notation  denotes 

the  leading  n  x  k  submatrix  of  M. 


Figure  1:  Representation  of  Vk+pHk+P  +  /fc+Pef+p-  Shaded  regions  denote 
lionzeros. 


Figure  2:  V k+pQQT  Hk+PQ  +  fk+Pel+pQ  after  p  implicitly  shifted  QR  steps. 


+ 


Figure  3:  Leading  k  columns  V^Hk  +  fk€\  ^orm  a  length  k  Arnoldi  factor¬ 
ization  after  discarding  the  last  p  columns. 
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Observe  that  if  m  =  n  then  /  =  0  and  this  iteration  is  precisely  the  same  as  the  Implicitly 
Shifted  QR  iteration.  Even  for  m  <  n,  the  first  k  columns  of  V  and  the  Hessenberg 
submatrix  i-.k)  are  mathematically  equivalent  to  the  matrices  that  would  appear  in  the 
full  Implicitly  Shifted  QR  iteration  using  the  same  shifts  fij.  In  this  sense,  the  Implicitly 
Restarted  Arnoldi  method  may  be  viewed  as  a  truncation  of  the  Implicitly  Shifted  QR 
iteration.  The  fundamental  difference  is  that  the  standard  Implicitly  Shifted  QR  iteration 
selects  shifts  to  drive  subdiagonal  elements  of  H  to  zero  from  the  bottom  up  while  the 
shift  selection  in  the  Implicitly  Restarted  Arnoldi  method  is  made  to  drive  subdiagonal 
elements  of  H  to  zero  from  the  top  down.  Important  implementation  details  concerning  the 
deflation  (setting  to  zero)  of  subdiagonal  elements  of  H  and  the  purging  of  unwanted  but 
converged  Ritz  values  are  beyond  the  scope  of  this  discussion.  However,  these  details  are 
extremely  important  to  the  success  of  this  iteration  in  difficult  cases.  Complete  details  of 
these  numerical  refinements  may  be  found  in  (Lehoucq,  1995  and  Lehoucq  and  Sorensen, 
1994). 

Algorithm  6:  An  Implicitly  Restarted  Arnoldi  Method 

Input:  (A,  V,  H ,  /)  with  AVm  —  VmHm  /mcm, 

(an  m-Step  Arnoldi  Factorization); 

For  £  =  1,2, 3, ...  until  convergence 

(a6.2)  Compute  cr(Hm)  and  select  set  of  p  shifts  --lip 

based  upon  a(Hm)  or  perhaps  other  information; 

(a6.3)  q  em; 

(a6.4)  For  j  —  l,2,...,p, 

Factor  [<?>,  Rj ]  =  q r(Hm  ~  Hjl)’, 

Hm  -  QfHmQj  ;  Vm  <-  VmQj; 

q*~qHQj; 

EndJFbr 

(a6.5)  fk  ♦  Vk+lftk  “I"  fm&k\  Vfc  *■  5  *  Hm(l:k,l:k)i 

(a6.6)  Beginning  with  the  fc-step  Arnoldi  factorization 
AVk=VkHk+fkel, 

apply  p  additional  steps  of  the  Arnoldi  process 
to  obtain  a  new  m-step  Arnoldi  factorization 
AVm  ~  VmHm  d"  fm&m  • 

End_For 

The  above  iteration  can  be  used  to  apply  any  known  polynomial  restart.  If  the  roots 
of  the  polynomial  are  not  known  there  is  an  alternative  implementation  that  only  requires 
one  to  compute  q\  =  rf:(H)e i,  where  tp  is  the  desired  degree  p  polynomial.  A  sequence  of 
Householder  transformations  may  developed  to  form  a  unitary  matrix  Q  such  that  Qe\  =  q\ 
and  H  <—  QH HQ  is  upper  Hessenberg.  The  details  which  follow  standard  developments  for 
the  Implicitly  Shifted  QR  iteration  will  be  omitted  here. 


15 


A  shift  selection  strategy  that  has  proved  successful  in  practice  is  called  the  “Exact  Shift 
Strategy”.  In  this  strategy,  one  computes  cr(H)  and  sorts  this  into  two  disjoint  sets  ftw  and 
Qu.  The  k  Ritz  values  in  the  set  are  regarded  as  approximations  to  the  “wanted” 
eigenvalues  of  A ,  and  the  p  Ritz  values  in  the  set  are  taken  as  the  shifts  pj.  An 
interesting  consequence  (in  exact  arithmetic)  is  that  after  Step  (a6.4)  above,  the  spectrum 
of  Hk  in  Step  (a6.5)  is  cr(Hk)  =  Slw  and  the  updated  starting  vector  V\  is  a  particular  linear 
combination  of  the  k  Ritz  vectors  associated  with  these  Ritz  values.  In  other  words,  the 
implicit  restarting  scheme  with  exact  shifts  provides  a  specific  selection  of  the  coefficients  7 j 
in  Eq.  (4)  and  this  implicit  scheme  costs  p  rather  than  the  k  +  p  matrix- vector  products  the 
explicit  scheme  would  require.  Thus  the  exact  shift  strategy  can  be  viewed  both  as  a  means 
to  damp  unwanted  components  from  the  starting  vector  and  also  as  directly  forcing  the 
starting  vector  to  be  a  linear  combination  of  wanted  eigenvectors.  The  exact  shift  strategy 
has  two  additional  interesting  theoretical  properties. 

Lemma  5  If  H  is  unreduced  and  diagonalizable  then: 

1.  The  polynomial  (j)  in  (5)  satisfies  <j>( A)  =  ^(A)p(A); 

where  $  is  the  exact  shift  polynomial  and  p  is  some 
polynomial  of  degree  at  most  fc  —  1. 

2.  The  updated  Krylov  subspace  generated  by  the  new 
starting  vector  satisfies 

JCm(A,  vf)  =  Span{xi,x2,  ■  •  •  ,Xk,Axj,A2Xj,- ■  ■  ,ApXj} 
for  j  =  1,2,-  k. 

The  first  property  </>(A)  =  ip(X)p(X)  indicates  that  the  linear  combination  selected  by  the 
exact  shift  scheme  is  somehow  minimal  while  the  second  property  indicates  that  each  of  the 
subspaces  Kp(A,xf)  C  ICm(A,vf)  so  that  each  sequence  of  “wanted”  Ritz  vectors  is  rep¬ 
resented  equally  in  the  updated  subspace.  The  first  property  was  established  in  (Lehoucq, 
1995)  along  with  an  extensive  analysis  of  the  numerical  properties  of  implicit  restarting.  The 
surprising  second  property  was  established  in  (Morgan,  1996),  along  with  some  compelling 
numerical  results  indicating  superior  performance  of  implicit  over  explicit  restarting. 

6.  The  Generalized  Eigenvalue  Problem 

A  typical  source  of  large-scale  eigenproblems  is  through  a  discrete  form  of  a  contin¬ 
uous  problem.  The  resulting  finite-dimensional  problems  become  large  due  to  accuracy 
requirements  and  spatial  dimensionality.  Typically  this  takes  the  form 


Cu  =  uX  in  Cl,  (9) 

u  satisfies  B  on  dCl, 

where  £  is  some  linear  differential  operator.  A  number  of  techniques  may  be  used  to 
discretize  £.  The  finite  element  method  provides  an  elegant  discretization.  If  W  is  a  space 
of  functions  in  which  the  solution  to  (9)  may  be  found  and  Wn  C  W  is  an  n- dimensional 
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subspace  with  basis  functions  {4>j}  then  an  approximate  solution  un  is  expanded  in  the 
form 

n 

un  =  'y 

j=\ 

A  variational  or  a  Galerkin  principle  is  applied  depending  on  whether  or  not  C  is  self-adjoint, 
leading  to  a  weak  form  of  (9) 

A(v,u)  =  A  <  v,  u  >,  (10) 

where  A{v,  u )  is  a  bilinear  form.  Substituting  the  expanded  form  of  u  =  un  and  requiring 
(10)  to  hold  for  each  trial  function  v  =  <pi  gives  a  set  of  algebraic  equations 


n  n 

A(<j>i-,  <i>j  & )  =  ^  ^  'y .  ->> 

3= 1  i=i 

where  <  •,  •  >  is  an  inner  product  in  VV„.  This  leads  to  the  following  systems  of  equations 

i>(*,*j)& =  (ii) 

j=i  i=i 


for  1  <  t  <  n.  We  may  rewrite  (11)  and  obtain  the  matrix  equation 

Ax  —  A  Mx, 


where 

A{ J  =  A(<pj.  ^j),  AfjJ  =<  <f>i i  <f)j  >,  -T  =  [Cl >  •  •  •  i  Cn]  ? 

for  1  <  i,j  <  n.  Typically  the  basis  functions  are  chosen  so  that  few  entries  in  a  row  of 
A  or  M  are  nonzero.  In  structures  problems  A  is  called  the  “stiffness”  matrix  and  M  is 
called  the  “mass”  matrix.  In  chemistry  and  physics  M  is  often  referred  to  as  the  “overlap” 
matrix.  A  nice  feature  of  this  approach  to  discretization  is  that  if  the  basis  functions  4>j  all 
individually  satisfy  B  on  89.  then  the  boundary  conditions  are  naturally  incorporated  into 
the  discrete  problem.  Moreover,  in  the  self-adjoint  case,  the  Rayleigh  principle  is  preserved 
from  the  continuous  to  the  discrete  problem.  In  particular,  since  Ritz  values  are  Rayleigh 
quotients,  this  assures  the  smallest  Ritz  value  is  greater  than  the  smallest  eigenvalue  of  the 
original  problem. 

Thus,  it  is  natural  for  large-scale  eigenproblems  to  arise  as  generalized  rather  than 
standard  problems.  If  C  is  self-adjoint  the  discrete  problems  are  symmetric  or  Hermitian 
and  if  not  the  matrix  A  is  nonsymmetric  but  the  matrix  M  is  symmetric  and  at  least 
positive  semi-definite.  There  are  a  number  of  ways  to  convert  the  generalized  problem  to 
standard  form.  There  is  always  motivation  to  preserve  symmetry  when  it  is  present. 

If  M  is  positive  definite  then  there  exists  a  factorization  M  =  LLT,  and  the  eigenvalues 
of  A  =  L~xAL~t  are  the  eigenvalues  of  (A,  M),  and  the  eigenvectors  are  obtained  by 
solving  Ltx  =  x,  where  x  is  an  eigenvector  of  A.  This  standard  transformation  is  fine  if  one 
wants  the  eigenvalues  of  largest  magnitude  and  it  preserves  symmetry  if  A  is  symmetric. 
However,  when  M  is  ill-conditioned  this  can  be  a  dangerous  transformation  leading  to 
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numerical  difficulties.  Since  a  matrix  factorization  will  have  to  be  done  anyway,  one  may 
as  well  formulate  a  spectral  transformation. 


6.1.  Structure  of  the  spectral  transformation 

A  convenient  way  to  provide  a  spectral  transformation  is  to  note  that 
Ax  =  XMx  -$=>•  ( A  —  /j,M)x  =  (A  -  yu)M x 


Thus 

(A  —  nM)~1Mx  =  xO,  where  0  =  - - . 

A  —  fi 

If  A  is  symmetric  then  one  can  maintain  symmetry  in  the  Arnoldi/Lanczos  process  by 
taking  the  inner  product  to  be 

<  x,y  >=  xT My. 

It  is  easy  to  verify  that  the  operator  ( A  -  is  symmetric  with  respect  to  this  inner 

product  if  A  is  symmetric.  In  the  Arnoldi/Lanczos  process  the  matrix- vector  product  w  *- 
Av  is  replaced  by  w  (A- nM)~lMv  and  the  step  h  •*—  VT f  is  replaced  by  h  <—  VT(M /). 
If  A  is  symmetric  then  the  matrix  H  is  symmetric  and  tridiagonal.  Moreover,  this  process 
is  well  defined  even  when  M  is  singular  and  this  can  have  important  consequences  even  if 
A  is  nonsymmetric.  We  shall  refer  to  this  process  as  the  M-Arnoldi  process. 

If  M  is  singular  then  the  operator  S  =  (A  —  /xM)_1M  has  a  nontrivial  null  space  and 
the  bilinear  function  <  x,y  >=  xT My  is  a  semi-inner  product  and  ||a? || jvr  =<  x,y  >*/2  is  a 
semi-norm.  Since  (A  —  pM)  is  assumed  to  be  nonsingular,  Af  =Null(S )  =Null{M).  Vectors 
in  A f  are  generalized  eigenvectors  corresponding  to  infinite  eigenvalues.  Typically,  one  is 
only  interested  in  the  finite  eigenvalues  of  (A,  M)  and  these  will  correspond  to  the  nonzero 
eigenvalues  of  S.  The  invariant  subspace  corresponding  to  these  nonzero  eigenvalues  is 
easily  corrupted  by  components  of  vectors  from  Af  during  the  Arnoldi  process.  However, 
using  the  M-Arnoldi  process  with  some  refinements  can  provide  a  solution. 

In  order  to  better  understand  the  situation,  it  is  convenient  to  note  that  since  M  is 
positive  semi-definite,  there  is  an  orthogonal  matrix  Q  such  that 


M  =  Q 


D  0 
0  0 


where  D  is  a  positive  definite  diagonal  matrix  of  order  n,  say.  Thus 

§ = QTSQ = 

where  5i  is  a  square  matrix  of  order  n  and  52  is  an  m  x  n  matrix  with  the  original  A,  M 
being  of  order  m  +  n.  Observe  now  that  a  nonzero  eigenvalue  A  of  5  satisfies  Sx  =  xX  ,  i.e. 


Si*i 

x\X 

.  S2XI  . 

x2X 

so  that  X2  =  y must  hold.  Note  also  that  for  any  eigenvector  xH  —  (x^xt?),  the 
leading  vector  *1  must  be  an  eigenvector  of  Si.  Since  5  is  block  triangular,  cr(5)  =  cr(5i)U 


51  0 

52  0  ’ 
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<t(0 m).  Assuming  52  has  full  rank,  it  follows  that  if  S 1  has  a  zero  eigenvalue  then  there  is  no 
corresponding  eigenvector  (since  S2X1  =  0  would  be.  implied).  Thus  if  zero  is  an  eigenvalue 
of  S\  with  algebraic  multiplicity  m0  then  zero  is  an  eigenvalue  of  S  of  algebraic  multiplicity 
m  +  m0  and  with  geometric  multiplicity  m.  Of  course,  since,  S  is  similar  to  S  all  of  these 
statements  hold  for  S  as  well. 


6.2,  Eigenvector/ null-space  purification 

With  these  observations  in  hand,  it  is  possible  to  see  the  virtue  of  using  M- Arnoldi  on 
S .  After  fc-steps  of  M-Arnoldi, 

SV  =  VH  +  feTk  with  VT MV  =  Ik,VTMf  =  0. 

Introducing  the  similarity  transformation  Q  gives 

SV  =  VH  +  feJ  with  VTQTMQV  =  Ik,VTQTMQf  =  0, 

where  V  =  QTV  and  /  =  QTf.  Partitioning  VT  =  {V^V^)  and  fT  =  /J )  consistent 

with  the  blocking  of  S  gives 

SM  =  VXH  +  fxeTk  with  V?DVX  =  Ik,V?Dfx  =  0. 

Moreover,  the  side  condition  52 hi  =  V2H  +  /2ej[  holds,  so  that  in  exact  arithmetic  a  zero 
eigenvalue  should  not  appear  as  a  converged  Ritz  value  of  H.  This  argument  shows  that 
M-Arnoldi  on  S  is  at  the  same  time  doing  P-Arnoldi  on  Si  while  avoiding  convergence  to 
zero  eigenvalues. 

Round-off  error  due  to  finite  precision  arithmetic  will  cloud  the  situation,  as  usual.  It 
is  clear  that  the  goal  is  to  prevent  components  in  N  from  corrupting  the  vectors  V .  Thus, 
to  begin,  the  starting  vector  Vi  should  be  of  the  form  Vi  =  Sv .  If  a  final  approximate 
eigenvector  x  has  components  in  M  they  may  be  purged  by  replacing  x  Sx  and  then 

normalizing.  To  see  the  effect  of  this,  note  that  if  x  =  Q  then  Sx  =  Q  , 

%2  *^2^1 

and  all  components  in  N  which  are  of  the  form  Q  ^  will  have  been  purged.  This 

final  application  of  S  may  be  done  implicitly  in  two  ways.  One  is  to  note  that  if  x  =  Vy 
with  Hy  —  yd  then  Sx  =  VHy  +  feky  =  xd  +  fejy,  and  this  is  the  correction  suggested 
by  (Nour-Omid  et  al.,  1987).  Another  recent,  suggestion  due  to  Meerbergen  and  Spence 
is  to  use  implicit  restarting  with  a  zero  shift  (Meerbergen  and  Spence,  1995).  Recall  that 
implicit  restarting  with  l  zero  shifts  is  equivalent  to  starting  the  M-Arnoldi  process  with  a 
starting  vector  of  StV\  and  all  the  resulting  Ritz  vectors  will  be  multiplied  by  Se  as  well. 
After  applying  the  implicit  shifts  to  H ,  the  leading  submatrix  of  order  k  —  i  will  provide  the 
updated  Ritz  values.  No  additional  explicit  matrix-vector  products  with  S  are  required. 

The  ability  to  apply  t  zero  shifts  (i.e.,  to  multiply  by  Sl  implicitly)  is  very  important 
when  Sx  has  zero  eigenvalues.  If  =  0  then 


Si  0 

s2  0 


0 

S2Z1 


€  M. 
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Thus  to  completely  eradicate  components  from  Af  one  must  multiply  by  Se,  where  l  is  equal 
to  the  dimension  of  the  largest  Jordan  block  corresponding  to  a  zero  eigenvalue  of  S\. 

Spectral  transformations  were  studied  extensively  by  Ericsson  and  Ruhe  (1980)  and  the 
first  eigenvector  purification  strategy  was  developed  in  (Nour-Omid  et  al.,  1987).  Shift  and 
invert  techniques  play  an  essential  role  in  the  block  Lanczos  code  developed  by  Grimes, 
Lewis,  and  Simon.  The  many  nuances  of  this  technique  in  practical  applications  are  dis¬ 
cussed  thoroughly  in  (Grimes  et  al.,  1994).  The  development  presented  here  and  the  eigen¬ 
vector  purification  through  implicit  restarting  is  due  to  Meerbergen  and  Spence  (1995). 


6.3.  An  example 

This  discussion  is  illustrated  with  the  following  example. 


A  = 


K  C 
CT  0 


and  M  = 


I  0 
0  0  5 


with  A  an  order  325  matrix  approximation  to  a  convection-diffusion  operator  and  C  a 
structured  random  matrix.  This  example  was  chosen  because  it  has  the  block  structure  of 
a  typical  steady-state  Navier-Stokes  linear  stability  analysis;  see  (Meerbergen  and  Spence, 
1995).  The  following  MATLAB  code  was  used  to  generate  the  example: 


rand( ’seed’ ,0) ; 
n  =  225;m=100; 

K  =  lapc(n,100); 

C  =  [rand(m,m)  ;  zeros (n-m,m)] ; 

M  =  [eye(n)  zeros(n,m)  ;  zeros(m,n)  zeros(m,m)]; 
A  =  [K  C  ;  C’  zeros(m,m)]; 
mu  =  7.0; 

S  =  (A  -  mu*M)\M; 


The  matrices  K ,  C ,  M ,  A  correspond  to  the  matrices  in  the  equations  above.  The  function 
lapc  computes  a  finite  difference  approximation  to  A u  +  pux  on  a  15  x  15  regular  grid  in 
the  unit  square  with  p  =  100.  Any  matrix  pencil  (A,  M )  with  this  block  structure  (assuming 
C  has  full  rank  and  A  —  pM  is  nonsingular)  will  produce  an  S  of  the  form 


S  = 


0  0  0 

0  S22  0 

S31  S 32  0 


with  the  upper-left  zero  block  of  order  m  and  with  S22  nonsingular  and  order  n  —  m.  From 
the  above  discussion  one  may  conclude  that  5  has  an  eigenvalue  0  with  algebraic  multiplicity 
2m  and  geometric  multiplicity  m.  There  are  three  important  subspaces  associated  with  S. 
They  are  AT  ,  Q  and  71 ,  and  these  spaces  satisfy 


sat  =  {0}  ,  sgcAf  ,  sncn. 


All  of  C"  may  be  represented  as  a  direct  sum  of  these  three  spaces.  The  (oblique)  projectors 
associated  with  these  spaces  shall  be  denoted  by  jFV  ,  Pg,  and  Pn  respectively.  Explicit 
formulas  are: 
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mrn  m  11^11  b i 

3.70  1.48(-11)  1.32(-11)  2.85(-12) 

Table  1:  Projection  of  V  onto  J\f  and  Q 

j _ [[Axj  —  AfigjAjH _ ||(Axj  —  MxjXjY^ 

1  1.50(-03)  9.93(-06) 

2  l.ll(-02)  6.77(-05) 

Table  2:  Residuals  before  and  after  purging  components  from  A  and  Q 


Pm 


Pg 


Pn 


0  0  0 
0  0  0 
o  -sZ2s£  I 

I  0  0  ' 

0  0  0, 

0  0  0 

0  0  0  ' 

0  S22 

£31  Sz2  0 


Table  1  shows  the  norms  of  the  projections  of  the  basis  vectors  V  onto  the  spaces  N  and 
(/,  where  V  was  computed  with  20  steps  of  M-Arnoldi  starting  with  a  vector  V\  =  Sv  (v  a 
vector  with  all  entries  equal  to  1).  The  norms  of  the  projections  are  taken  before  and  after 
purging  by  applying  two  zero  shifts  using  implicit  restarting.  The  “+”  symbol  denotes  the 
updated  basis  after  purging. 

Table  2  shows  the  residual  norms  for  the  two  approximate  eigenvalues  that  are  closest 
to  the  shift  n  before  and  after  purging. 

Clearly,  there  is  considerable  merit  to  doing  this  purging.  This  generalizes  the  purging 
proposed  in  (Nour-Omid  et  al.,  1995)  and  seems  to  be  quite  promising.  Further  testing  is 
needed  but  some  form  of  this  process  is  essential  to  the  construction  of  numerical  software 
to  implement  shift-invert  strategies. 


7.  Software,  Performance,  and  Parallel  Computation 

The  Implicitly  Restarted  Arnoldi  Method  has  been  implemented  and  a  package  of  For¬ 
tran  77  subroutines  has  been  developed.  This  software,  called  ARPACK  (Lehoucq  et  al., 
1994),  provides  several  features  which  are  not  present  in  other  codes  based  upon  a  single¬ 
vector  Arnoldi  process.  One  of  the  most  important  features  from  the  software  standpoint 
is  the  reverse  communication  interface.  This  feature  provides  a  convenient  way  to  interface 
with  application  codes  without  imposing  a  structure  on  the  user’s  matrix  or  the  way  a 
matrix-vector  product  is  accomplished.  In  the  parallel  setting,  this  reverse  communication 
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interface  enables  efficient  memory  and  communication  management  for  massively  parallel 
MIMD  and  SIMD  machines.  The  important  features  of  ARPACK  are: 

•  A  reverse  communication  interface. 

•  Ability  to  return  k  eigenvalues  that  satisfy  a  user  specified  criterion,  such  as  largest 
real  part,  largest  absolute  value,  largest  algebraic  value  (symmetric  case),  etc. 

•  A  fixed  pre-determined  storage  requirement  throughout  the  computation.  Usually 
this  is  n  *  0(2k)  +  0(k 2),  where  k  is  the  number  of  eigenvalues  to  be  computed  and 
n  is  the  order  of  the  matrix.  No  auxiliary  storage  or  interaction  with  such  devices  is 
required  during  the  course  of  the  computation. 

•  Eigenvectors  computed  on  request.  The  Arnoldi  basis  of  dimension  k  is  always  com¬ 
puted.  The  Arnoldi  basis  consists  of  vectors  which  are  numerically  orthogonal  to 
working  accuracy.  Computed  eigenvectors  of  symmetric  matrices  are  also  numerically 
orthogonal. 

•  User-specified  numerical  accuracy  of  the  computed  eigenvalues  and  vectors.  Residual 
tolerances  may  be  set  to  the  level  of  working  precision.  At  working  precision,  the 
accuracy  of  the  computed  eigenvalues  and  vectors  is  consistent  with  the  accuracy 
expected  of  a  dense  method  such  as  the  implicitly  shifted  QR  iteration. 

•  No  theoretical  or  computational  difficulty  for  multiple  eigenvalues,  other  than  addi¬ 
tional  matrix- vector  products  required  to  expose  the  multiple  instances.  This  is  made 
possible  through  the  implementation  of  deflation  techniques  similar  to  those  employed 
to  make  the  implicitly  shifted  QR-algorithm  robust  and  practical.  A  block  method  is 
not  required;  hence,  one  does  not  need  to  “guess”  the  correct  blocksize  that  would  be 
needed  to  capture  multiple  eigenvalues. 


7.1.  Reverse  communication  interface 

As  mentioned  above,  the  reverse  communication  interface  is  one  of  the  most  important 
aspects  of  the  design  of  ARPACK.  In  the  serial  code,  a  typical  usage  of  this  interface  is 
illustrated  with  the  following  example,  where  snaupd  is  an  ARPACK  module: 

10  continue 

call  snaupd  (ido,  bmat,  n,  which,..., 

*  V,  . . ,  work,  info) 
if  (ido  .eq.  newprod)  then 

call  matvec  (’A’,  n,  workd ( ipntr ( 1 ) ) , 

*  workd(ipntr(2))) 
else 

return 
endif 
go  to  TO 
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As  usual,  with  reverse  communication,  control  is  returned  to  the  calling  program  when 
interaction  with  the  matrix  A  is  required.  The  action  requested  of  the  calling  program  is 
to  simply  perform  the  action  indicated  by  the  reverse  communication  parameter  ido  (in 
this  case,  multiply  the  vector  held  in  the  array  workd  beginning  at  location  ipntr(l)  and 
put  the  result  in  the  array  workd  beginning  at  location  ipntr(2)).  Note  that  call  to  the 
subroutine  matvec  in  this  code  segment  is  simply  meant  to  indicate  that  this  matrix- vector 
operation  is  taking  place.  The  user  is  free  to  use  any  available  mechanism  or  subroutine  to 
accomplish  this  task.  In  particular,  no  specific  data  structure  is  imposed  and,  indeed,  no 
explicit  representation  of  the  matrix  is  even  required.  One  only  needs  to  supply  the  action 
of  the  matrix  on  the  specified  vector. 

There  are  several  reasons  for  supplying  this  interface.  It  is  more  convenient  to  use  with 
large  application  codes.  The  alternative  is  to  put  the  user  supplied  matrix- vector  product 
in  a  subroutine  with  a  pre-specified  calling  sequence.  This  may  be  quite  cumbersome  and 
is  especially  so  in  those  cases  where  the  action  of  the  matrix  on  a  vector  is  known  only 
through  a  lengthy  computation  that  doesn’t  involve  the  matrix  A  explicitly.  Typically,  if 
the  matrix- vector  product  must  be  provided  in  the  form  of  a  subroutine  with  a  fixed  calling 
sequence,  then  named  common  or  some  other  means  must  be  used  to  pass  data  to  the  routine. 
This  is  incompatible  with  efficient  memory  management  for  massively  parallel  MIMD  and 
SIMD  machines. 

This  has  been  implemented  on  a  number  of  parallel  machines  including  the  CRAY-C90, 
Thinking  Machines  CM-200  and  CM-5,  Intel  Delta,  and  CRAY  T3D.  Parallel  performance 
on  the  C90  is  obtained  through  the  BLAS  operations  without  any  modification  to  the  serial 
code.  SIMD  performance  on  the  CM-200  is  also  relatively  straightforward.  All  of  the  BLAS 
operations  were  expressed  using  Fortran90  array  constructs  and  hence  were  automatically 
compiled  for  execution  on  the  SIMD  array  instead  of  the  front  end.  Operations  on  the 
projected  matrix  H  were  not  encoded  with  these  array  constructs  and  hence  were  auto¬ 
matically  scheduled  for  the  front  end.  The  only  additional  complication  was  to  define  the 
data  layouts  of  the  V  array  and  the  work  arrays  for  efficient  execution.  In  the  distributed 
memory  implementations,  the  reverse  communication  interface  provided  a  natural  way  to 
parallelize  the  ARPACK  codes  internally  without  imposing  a  fixed  parallel  decomposition 
on  the  user  supplied  matrix- vector  product. 

7.2.  Data  distribution  and  global  operations 

The  parallelization  strategy  for  distributed  memory  machines  consists  of  providing  the 
user  with  a  Single  Program  Multiple  Data  (SPMD)  template.  The  array  V  is  blocked 
and  distributed  across  the  processors.  The  projected  matrix  H  is  replicated.  The  SPMD 
program  looks  essentially  like  the  serial  code  except  that  the  local  block  Vloc  is  passed 
in  place  of  V.  The  work  space  is  partitioned  consistently  with  the  partition  of  V  and 
each  section  of  the  work  space  is  distributed  to  the  node  processors.  Thus  the  SPMD 
parallel  code  looks  very  similar  to  that  of  the  serial  code.  Assuming  a  parallel  version  of  the 
subroutine  matvec,  an  example  of  the  application  of  the  distributed  interface  is  illustrated 
as  the  follows: 

10  continue 

call  snaupd  (ido,  bmat,  nloc,  which,  ..., 
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*  Vloc,  work,  info) 
if  (ido  .  eq.  newprod)  then 

call  matvec  (’A5,  nloc,  workd(ipntr(l)) , 

*  workd ( ipnt r ( 2 ) ) ) 
else 

return 
endif 
go  to  10 

Where,  nloc  is  the  number  of  rows  in  the  block  Vloc  of  V  that  has  been  assigned  to  this 
node  process. 

Typically,  the  blocking  of  V  is  commensurate  with  the  parallel  decomposition  of  the 
matrix  A  as  well  as  with  the  configuration  of  the  distributed  memory  and  interconnection 
network.  Logically,  the  V'  matrix  be  partitioned  by  blocks 

vT  =  (v^T  ,  ....,v(nproc^T) 

with  one  block  per  processor  and  with  H  replicated  on  each  processor. 

The  explicit  steps  of  the  process  responsible  for  the  j  block  are: 

1.  f3k  =  gnorm(f{k*] );  «-  / /?; 

3.  z  <-  ( Aloc)vk+i ; 

4.  h,W  V^Tz;  h  *-  gsum(h^*'>)  fk+ 1  *-  z  -  Vk+i  h\ 

5.  Hk+ 1  -1—  ( Hk,h ); 

The  function  gnorm  at  Step  1  is  meant  to  represent  the  global  reduction  operation  of 
computing  the  norm  of  the  distributed  vector  fk  from  the  norms  of  the  local  segments 
fk  \  and  the  function  gsum  at  Step  4  is  meant  to  represent  the  global  sum  of  the  local 
vectors  h so  that  the  quantity  h  —  J2’j=i°C  is  available  to  each  process  on  completion. 
These  are  the  only  two  global  communication  points  within  this  algorithm.  The  remainder 
is  perfectly  parallel.  Additional  communication  will  typically  occur  at  Step  3.  Here  the 
operation  ( Aloc)v  is  meant  to  indicate  that  the  user  supplied  matrix- vector  product  is  able 
to  compute  the  local  segment  of  the  matrix-vector  product  Av  that  is  consistent  with  the 
partition  of  V.  Ideally,  this  would  only  involve  nearest  neighbor  communication  among  the 
processes. 

Since  H  is  replicated  on  each  processor,  the  parallelization  of  the  implicit  restart  mech¬ 
anism  described  by  Algorithm  6  remains  untouched.  The  only  difference  is  that  the  local 
block  V W  takes  the  place  of  the  full  matrix  V.  All  operations  on  the  matrix  H  are  repli¬ 
cated  on  each  processor.  Thus  there  is  no  communication  overhead  but  there  is  a  “serial 
bottleneck”  here  due  to  the  redundant  work.  If  k  is  small  relative  to  n,  this  bottleneck 
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is  insignificant.  However,  it  becomes  a  very  important  latency  issue  as  k  grows,  and  will 
prevent  scalability  if  k  grows  with  n  as  the  problem  size  increases. 

The  main  benefit  of  this  approach  is  that  the  changes  to  the  serial  version  of  ARPACK 
are  very  minimal.  Since  the  change  of  dimension  from  matrix  order  n  to  its  local  distributed 
blocksize  nloc  is  invoked  through  the  calling  sequence  of  the  subroutine  snaupd,  there  is 
no  essential  change  to  the  code.  Only  six  routines  were  effected,  and  these  in  a  minimal 
way.  These  routines  required  either  a  change  in  norm  calculation  for  distributed  vectors 
(Step  1)  or  in  the  distributed  dense  matrix-vector  product  (Step  4).  Since  the  vectors  are 
distributed,  norms  had  to  be  done  via  partial  (scaled)  dot  products  for  the  local  vector 
segments  and  then  a  global  sum  operation  was  used  to  complete  the  sum  of  the  squared 
norms  of  these  segments  on  all  processors.  More  specifically,  the  commands  are  changed 
from 

rnorm  =  sdot  (n,  resid,  1,  workd,  1) 
rnorm  =  sqrt(abs (rnorm)) 


to 


rnormO  *  sdot  (n,  resid,  1,  workd,  1) 
call  gssum(rnormO,l,tmp) 
rnormO  =  sqrt(abs (rnormO)) 
rnorm  =  rnormO 

Similarly,  the  computation  of  the  matrix-vector  product  operation  h  VTw  requires  a 
change  from 

call  sgemv  (’T’,  n,  j,  one,  v,  ldv,  workd(ipj),  1, 

*  zero,  h(l,j),  1) 

to 

call  sgemv  (’T’,  n,  j,  one,  v,  ldv,  workd(ipj),  1, 

*  zero,  h(l,j),  1) 
call  gssum(h(l,  j)  ,  j  ,h(l,  j+D) 

so  the  global  sum  operation  gssum  was  sufficient  to  implement  all  of  the  global  operations. 

7.3.  Distributed  memory  parallel  performance 

To  get  an  idea  of  the  potential  performance  of  ARPACK  on  distributed  memory  ma¬ 
chines  some  examples  have  been  run  on  the  Intel  Touchstone  DELTA.  The  examples  have 
been  designed  to  test  the  performance  of  the  software,  the  matrix  structure,  the  Touchstone 
DELTA  machine  architecture,  and  the  speedup  behavior  of  the  software  on  DELTA. 

The  user’s  implementation  of  the  matrix- vector  product  w  <—  Av  can  have  considerable 
effect  upon  the  parallel  performance.  Moreover,  there  is  a  fundamental  difficulty  in  testing 
how  the  performance  scales  as  the  problem  size  increases.  The  difficulty  is  that  the  prob¬ 
lem  often  becomes  increasingly  difficult  to  solve  as  the  size  increases  due  to  clustering  of 
eigenvalues.  The  tests  reported  here  attempt  to  isolate  and  measure  the  performance  of  the 
parallelization  of  the  ARPACK  routines  independently  of  the  matrix- vector  product. 
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Problem  size  Number  of  nodes  Total  Time  (s) 


3000*1 

1 

22.96 

3000*2 

2 

23.22 

3000*4 

4 

23.98 

3000*8 

8 

24.08 

3000*16 

16 

24.39 

3000*32 

32 

24.95 

3000*64 

64 

25.50 

3000*128 

128 

27.13 

3000*256 

256 

28.65 

Table  3:  Parallel  ARPACK  scaled  speedup  test  on  DELTA,  matrix  order  3,000  on  each 
node 

In  order  to  isolate  the  performance  of  the  ARPACK  routines  from  the  performance  of  the 
user’s  matrix- vector  product  and  also  to  isolate  effects  of  a  changing  problem  characteristics 
as  the  size  increases,  a  test  was  comprised  of  replicating  the  same  matrix  repeatedly  to  obtain 
a  block  diagonal  matrix.  Each  diagonal  block  corresponds  to  a  block  of  the  partitioned  and 
distributed  matrix  V.  This  is,  of  course,  a  completely  contrived  situation  that  allows  the 
workload  to  increase  linearly  with  the  number  of  processors.  Since  the  each  diagonal  block 
of  the  matrix  is  identical  the  algorithm  should  behave  as  if  nproc  identical  problems  are 
being  solved  simultaneously  as  long  as  the  initial  distributed  segments  of  tq  are  generated 
the  same.  Thus,  the  only  things  that  could  prevent  ideal  speedup  are  the  communication 
involved  in  the  global  operations  and  the  “serial  bottleneck”  associated  with  the  replicated 
operations  on  the  projected  matrix  H.  If  neither  of  these  were  present  then  one  would  expect 
the  execution  time  to  remain  constant  as  the  problem  size  and  the  number  of  processors 
increase. 

In  this  first  example,  each  diagonal  block  is  of  order  3,000,  which  is  identical  to  the 
vector  segment  size  on  each  node.  The  matrix-vector  product  operation  <—  (A/ocjuj^ 

is  executed  locally  on  each  node  processor  upon  the  distributed  vector  segments  ,  and 
there  is  no  communication  among  processors  involved  in  this  operation.  As  described  above, 
the  problem  size  in  increased  linearly  with  the  the  number  of  processors  by  adjoining  an 
additional  identical  diagonal  block  to  the  A  matrix  for  each  additional  processor.  The  global 
sum  operation  gssum  is  essentially  a  ring  algorithm  and  thus  has  a  linear  cost  with  respect 
to  the  number  of  nodes.  Since  the  diagonal  blocks  are  identical,  the  replicated  operations 
on  H  should  remain  the  same  as  the  problem  size  increases  and  hence  linear  speedup  is 
expected,  i.e.,  as  the  problem  size  increases  the  execution  time  should  remain  constant. 
This  ideal  speedup  is  very  nearly  achieved,  as  reflected  in  Table  3. 

The  second  example  is  obtained  from  a  similar  numerical  model  of  the  eigenproblem  of 
the  Laplacian  operator  defined  on  the  unit  square  with  square  with  Dirichlet  boundary  con¬ 
ditions  on  three  sides  and  a  Neuman  boundary  condition  on  the  fourth  side.  This  leads  to 
a  mildly  nonsymmetric  matrix  with  the  same  5-diagonal  structure  as  the  standard  2-D  dis¬ 
crete  Laplacian  with  a  5-point  stencil.  The  unit  square  {(x,y)|0  <  x,y  <  1}  was  discretized 
with  ^-direction  mesh  size  l/(n  +  l)  and  ^-direction  mesh  size  l/(m  +  l),  respectively.  Thus 
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Problem  size  Number  of  nodes  Total  Time  (s) 


2500*1 

1 

19.63 

2500*2 

2 

20.71 

2500*4 

4 

21.97 

2500*8 

8 

22.47 

2500*16 

16 

22.50 

2500*32 

32 

23.13 

2500*64 

64 

23.68 

2500*128 

128 

24.78 

2500*256 

256 

28.16 

Table  4:  Parallel  ARPACK  scaled  speedup  test  on  DELTA,  matrix  order  2,500  on  each 
node 

the  matrix  A  is  block  tridiagonal  and  of  order  N  =  nm  .  The  order  of  each  diagonal  block 
is  n,  and  the  number  of  diagonal  blocks  is  m. 

A  natural  way  to  carry  out  the  matrix- vector  product  operation  w  *-  Av  is  described 
as  follows.  A  standard  domain  decomposition  partitioning  of  the  unit  square  into  sub¬ 
rectangles  leads  to  a  parallel  matrix- vector  product  that  exchanges  data  only  across  the 
boundaries  of  the  subdomains  and  hence  needs  only  nearest  neighbor  connections.  The 
subdomains  are  naturally  chosen  so  that  the  blocking  of  the  matrix  is  commensurate  with 
the  blocking  and  distribution  of  the  V  array.  The  reverse  communication  interface  allows 
the  user  supplied  matrix- vector  product  to  take  advantage  of  the  matrix  structure.  Simple 
send  and  receive  operations  using  the  native  Intel  isend  and  irecv  are  used  to  carry  out  the 
nearest  neighbor  communication  operation. 

The  results  of  these  tests  are  given  in  Table  4  and  demonstrate  nearly  the  same  speedup 
as  in  Table  3.  The  relatively  minor  communication  to  receive  boundary  data  from  nearest 
neighbors  slightly  degraded  the  speedup. 

The  final  example  shows  how  dramatically  an  inefficient  matrix- vector  product  operation 
w  Av  and  also  how  problem  size  can  effect  performance.  A  naive  way  to  perform  the 
matrix- vector  product  would  be  to  collect  the  segments  of  the  vector  v  from  all  nodes  before 
the  operation,  and  then  distribute  the  segments  of  the  result  vector  w  to  each  node  after  the 
operation.  The  performance  of  this  scheme  is  shown  in  Table  5.  No  advantage  of  the  matrix 
structure  was  taken  in  computing  the  matrix- vector  product.  The  matrix  size  was  fixed  at 
n  =  3,200.  The  parallel  ARPACK  software  was  then  used  to  compute  the  eigenvalues  and 
eigenvectors.  A  residual  tolerance  of  (10-8)  was  imposed. 

Table  5  shows  the  total  time  and  the  number  of  iterations  required  to  solve  this  fixed 
problem  with  a  different  number  of  processors.  The  number  of  iterations  varied  with  dif¬ 
ferent  processor  configurations  and  this  is  attributed  to  different  initial  random  vectors 
being  generated  as  the  number  of  processors  changed.  However,  the  corresponding  result 
eigenvalues  and  eigenvectors  are  identical  for  all  of  the  runs. 

The  speedup  caused  by  increasing  the  number  of  processors  can  be  observed  by  checking 
the  average  run  time  per  iterate  for  each  individual  test.  The  fourth  column  in  Table  5, 
demonstrates  deteriorated  speedup  after  the  number  of  processors  exceeds  32.  Column  five 
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Nodes 

Time  (s) 

Iters. 

Ave.  TCe 

OP*x  Time 
Total  Time_ 

i 

1809.07 

173 

10.46 

0.84  % 

2 

1073.36 

189 

5.679 

1.48  % 

4 

732.72 

213 

3.440 

2.65  % 

8 

449.95 

225 

2.000 

5.24  % 

16 

201.27 

192 

1.048 

8.90  % 

32 

114.98 

154 

0.747 

13.3  % 

64 

161.24 

260 

0.620 

18.0  % 

128 

128.28 

210 

0.611 

25.9  % 

Table  5:  Parallel  ARPACK  fixed-size  speeedup  test,  matrix  order  3,200 
shows  that  the  reason  for  this  deterioration  lies  with  the  inefficient  matrix-vector  product. 

7.4.  General  applications  of  ARPACK 

ARPACK  has  been  used  in  a  variety  of  challenging  applications,  and  has  proven  to 
be  useful  both  in  symmetric  and  nonsymmetric  problems.  It  is  of  particular  interest  when 
there  is  no  opportunity  to  factor  the  matrix  and  employ  a  “shift  and  invert”  form  of  spectral 
transformation, 

A  «-  (A  -  a/)"1  .  (12) 

Existing  codes  often  rely  upon  this  transformation  to  enhance  convergence.  Extreme  eigen¬ 
values  {fj,}  of  the  matrix  A  are  found  very  rapidly  with  the  Arnoldi/Lanczos  process  and 
the  corresponding  eigenvalues  {A}  of  the  original  matrix  A  are  recovered  from  the  relation 
A  =  1  /n  +  a.  Implementation  of  this  transformation  generally  requires  a  matrix  factoriza¬ 
tion.  In  many  important  applications  this  is  not  possible  due  to  storage  requirements  and 
computational  costs.  The  implicit  restarting  technique  used  in  ARPACK  is  often  successful 
without  this  spectral  transformation. 

One  of  the  most  important  classes  of  application  arise  in  computational  fluid  dynamics. 
Here  the  matrices  are  obtained  through  discretization  of  the  Navier-Stokes  equations.  A  typ¬ 
ical  application  involves  linear  stability  analysis  of  steady  state  solutions.  Here  one  linearizes 
the  nonlinear  equation  about  a  steady  state  and  studies  the  stability  of  this  state  through 
the  examination  of  the  spectrum.  Usually  this  amounts  to  determining  if  the  eigenvalues  of 
the  discrete  operator  lie  in  the  left  halfplane.  Typically  these  are  parametrically  dependent 
problems;  the  analysis  consists  of  determining  phenomena  such  as  simple  bifurcation,  Hopf 
bifurcation  (an  imaginary  complex  pair  of  eigenvalues  cross  the  imaginary  axis),  turbulence, 
or  vortex  shedding  as  a  parameter  is  varied.  ARPACK  is  well  suited  to  this  setting  as  it  is 
able  to  track  a  specified  set  of  eigenvalues  while  they  vary  as  functions  of  the  parameter. 
Our  software  has  been  used  to  find  the  leading  eigenvalues  in  a  Couette- Taylor  wavy  vortex 
instability  problem  involving  matrices  of  order  4,000.  One  interesting  facet  of  this  applica¬ 
tion  is  that  the  matrices  are  not  available  explicitly  and  are  logically  dense.  The  particular 
discretization  provides  efficient  matrix-vector  products  through  Fourier  transform.  Details 
may  be  found  in  (Edwards  et  al.,  1994). 
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Very  large  symmetric  generalized  eigenproblems  arise  in  structural  analysis.  One  ex¬ 
ample  that  we  have  worked  with  at  Cray  Research  through  the  courtesy  of  Ford  Motor 
Company  involves  an  automobile  engine  model  constructed  from  3D  solid  elements.  Here 
the  interest  is  in  a  set  of  modes  to  allow  solution  of  a  forced  frequency  response  problem 
(K  -  A M)x  =  /(f),  where  f(t)  is  a  cyclic  forcing  function  which  is  used  to  simulate  ex¬ 
panding  gas  loads  in  the  engine  cylinder  as  well  as  bearing  loads  from  the  piston  connecting 
rods.  This  model  has  over  250,000  degrees  of  freedom.  The  smallest  eigenvalues  are  of 
interest  and  the  ARPACK  code  appears  to  be  very  competitive  with  the  best  commercially 
available  codes  on  problems  of  this  size.  For  details,  see  (Sorensen  et  ah,  1993). 

The  Singular  Value  Decomposition  (SVD)  may  also  be  computed  using  ARPACK  and 
possesses  many  large-scale  applications.  Two  SVD  applications  occur  in  computational 
biology.  The  first  of  these  is  the  3-D  image  reconstruction  of  biological  macromolecules 
from  2-D  projections  obtained  through  electron  micrographs.  The  second  is  an  application 
to  molecular  dynamical  simulation  of  the  motions  of  proteins.  The  SVD  may  be  used  to 
compress  the  data  required  to  represent  the  simulation  and  more  importantly  to  provide 
an  analytical  tool  to  help  in  understanding  the  function  of  the  protein.  See  (Romo  et  al., 
1994)  for  further  details  of  the  molecular  dynamics  application.  The  underlying  algorithm 
for  reconstructing  3-D  image  reconstruction  of  biological  macromolecules  from  2-D  pro¬ 
jections  (Van  Heel  and  Frank,  1981)  is  based  upon  the  statistical  technique  of  principal 
component  analysis  (Van  Huffle  and  Vandewalle,  1991).  In  this  algorithm,  a  singular  value 
decomposition  (SVD)  of  the  data  set  is  performed  to  extract  the  largest  singular  vectors, 
which  are  then  used  in  a  classification  procedure.  Our  initial  effort  has  been  to  replace  the 
existing  algorithm  for  computing  the  SVD  with  ARPACK  which  has  increased  the  speed  of 
the  analysis  by  a  factor  of  7  on  an  Iris  workstation.  The  accuracy  of  the  results  were  also 
increased  dramatically.  Details  are  reported  in  (Feinswog  et  al.,  in  preparation). 

Computational  chemistry  provides  a  rich  source  of  problems. 

ARPACK  is  being  used  in  two  applications  currently  and  holds  pro¬ 
mise  for  a  variety  of  challenging  problems  in  this  area.  We  are  collaborating  with  researchers 
at  Ohio  State  on  large-scale  three-dimensional  reactive  scattering  problems.  The  governing 
equation  is  the  Schroedinger  equation  and  the  computational  technique  for  studying  the 
physical  phenomena  relies  upon  repeated  eigenanalysis  of  a  Hamiltonian  operator  consisting 
of  a  Laplacian  operator  discretized  in  spherical  co-ordinates  plus  a  surface  potential.  The 
discrete  operator  has  a  tensor  product  structure  from  the  discrete  Laplacian  plus  a  diagonal 
matrix  from  the  potential.  The  resulting  matrix  has  a  block  structure  consisting  of  m  x 
m  blocks  of  order  n  .  The  diagonal  blocks  are  dense  and  the  off  diagonal  blocks  are 
scalar  multiples  of  the  order  n  identity  matrix.  It  is  virtually  impossible  to  factor  this 
matrix  directly  because  the  factors  are  dense  in  any  ordering.  We  are  using  a  distributed 
memory  parallel  version  of  ARPACK  together  with  some  preconditioning  ideas  to  solve 
these  problems  on  distributed  memory  machines.  Encouraging  computational  results  have 
been  obtained  on  Cray  Y-MP  machines  and  also  on  the  Intel  Delta  and  the  CM-5.  The 
code  has  recently  been  ported  to  the  CRAY  T3D  with  very  promising  results.  On  a  matrix 
of  order  12,800,  computing  the  smallest  eight  eigenvalues  using  a  Chebyshev  polynomial 
preconditioner  of  degree  eight,  the  CRAY  YMP  executed  at  a  rate  of  290.66  Mflop/s  while 
the  T3D  using  the  distributed-shared  memory  model  executed  at  a  peak  rate  of  1412  Mflop/s 
(See  Table  6).  For  details  about  the  method  and  experimental  results,  see  (Pendergast  et 
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Nprocs  Mflop/s 


2  172.50 

4  322.03 

8  586.29 

16  1006.60 

32  1412.73 

Table  6:  Parallel  ARPACK  fixed-size  computation  rate  test  on  T3D  Shared  Memory,  matrix 
order  12,800 

al.,  1994)  and  (Sorensen  et  al.,  1993). 

Nonsymmetric  problems  also  arise  in  quantum  chemistry.  Researchers  at  University  of 
Washington  have  used  the  code  to  investigate  the  effects  of  the  electric  field  on  InAs/GaSb 
and  GaAs/AirGai-a;  quantum  wells.  ARPACK  was  used  to  find  highly  accurate  solutions 
to  these  nonsymmetric  problems  which  defied  solution  by  other  means.  See  (Li  and  Kuhn, 
1993)  for  details.  Researchers  at  University  of  Massachusetts  have  used  ARPACK  to  solve 
eigenvalue  problems  arising  in  their  FEM  quantum  well  Kp  model  for  strained  layer  super¬ 
lattices  (Baliga  et  al.,  1994). 

A  final  example  of  nonsymmetric  eigenproblems  to  be  discussed  here  arises  in  magneto¬ 
hydrodynamics  (MHD)  involving  the  study  of  the  interaction  of  a  plasma  and  a  magnetic 
field.  The  MHD  equations  describe  the  macroscopic  behavior  of  the  plasma  in  the  magnetic 
field.  These  equations  form  a  system  of  coupled  nonlinear  PDEs.  Linear  stability  analysis 
of  the  linearized  MHD  equations  leads  to  a  complex  eigenvalue  problem.  Researchers  at 
the  Institute  for  Plasma  Physics  and  Utrecht  University  in  the  Netherlands  have  modified 
the  codes  in  ARPACK  to  work  in  complex  arithmetic  and  are  using  the  resulting  code  to 
obtain  very  accurate  approximations  to  the  eigenvalues  lying  on  the  Alfven  curve.  The  code 
is  not  only  computes  extremely  accurate  solutions,  it  does  so  very  efficiently  in  comparison 
to  other  methods  that  have  been  tried.  See  (Kooper  et  al.,  1993)  for  details. 

There  are  many  other  applications.  It  is  hoped  that  the  examples  briefly  mentioned  here 
indicate  the  versatility  of  the  ARPACK  software  as  well  as  the  wide  variety  of  eigenvalue 
problems  that  arise. 

8.  Conclusions 

This  paper  has  attempted  to  give  an  overview  of  the  numerical  solution  of  large-scale 
eigenvalue  problems.  Basic  theory  and  algorithms  were  introduced  to  motivate  Krylov 
subspace  projection  methods.  The  focus  has  been  on  a  particular  variant,  the  Implicitly 
Restarted  Arnoldi  Method,  which  has  been  developed  into  a  substantial  software  package, 
ARPACK. 

There  are  a  number  of  competing  methods  that  have  not  been  discussed  here  in  any 
detail.  Two  notable  methods  that  have  not  been  discussed  are  methods  based  on  the  non¬ 
symmetric  two-sided  Lanczos  process  and  methods  based  upon  subspace  iteration.  At  this 
point,  no  single  method  appears  to  be  viable  for  all  problems.  Certainly  in  the  nonsym¬ 
metric  case  there  is  no  “black  box”  technique  and  it  is  questionable  that  there  is  one  in  the 
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symmetric  case  either.  A  block  method  called  ABLE  based  upon  two-sided  nonsymmetric 
Lanczos  is  being  developed  by  Bai,  Day  and  Ye  (1995).  Software  based  upon  subspace  iter¬ 
ation  with  Chbeychev  acceleration  has  been  developed  by  Duff  and  Scott  (1993).  Jennifer 
Scott  has  also  developed  software  based  upon  an  explicitly  restarted  Chebyshev-Arnoldi 
method  (Scott,  1993).  Finally,  the  Rational  Krylov  method  being  developed  by  Ruhe  (1984 
and  1994)  is  very  promising  for  the  nonsymmetric  problem  when  a  factorization  of  the 
matrix  is  possible. 

The  computational  results  presented  in  Section  7  are  due  to  Zdenko  Tomasic  and  Dan 
Hu.  I  would  like  to  thank  Rich  Lehoucq  for  producing  Figures  1-3  and  for  constructive 
comments  and  discussions  about  this  work. 

Financial  support  for  this  work  was  provided  in  part  by  the  National  Science  Founda¬ 
tion  cooperative  agreement  CCR-912008,  by  ARPA  contract  number  DAAL03-91-C-0047 
(administered  by  the  U.S.  Army  Research  Office),  and  by  the  National  Science  Foundation 
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